July 7, 2025

Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs

When you run the same GPU program on an NVIDIA GPU and an AMD GPU, you might expect identical results. Surprisingly, that’s not always the case — even small floating-point differences can lead to divergent outcomes in high-performance computing (HPC) and machine learning workloads.

When you run the same GPU program on an NVIDIA GPU and an AMD GPU, you might expect identical results.
Surprisingly, that’s not always the case — even small floating-point differences can lead to divergent outcomes in high-performance computing (HPC) and machine learning workloads.

Our SC24-W workshop paper presents a systematic method to detect and analyze these cross-vendor numerical differences.


🔍 Why This Problem Exists

Numerical results can differ between GPUs because of:

In HPC, where bitwise reproducibility can be crucial for scientific validation, these differences matter.


🎯 Our Research Goals

  1. Detect: Identify workloads where NVIDIA and AMD GPUs produce different outputs.
  2. Quantify: Measure the severity of differences using ULP (Units in the Last Place).
  3. Explain: Trace differences back to likely causes.
  4. Guide: Offer strategies for developers to control or mitigate discrepancies.

🛠️ Our Testing Framework

We developed a vendor-agnostic GPU numeric testing tool that:

  1. Selects Kernels
    • Linear algebra (GEMM, LU decomposition)
    • Signal processing (FFT)
    • Element-wise operations (exp, log, sin, cos)
    • Reduction operations (sum, dot product)
  2. Runs on Both Platforms
    • NVIDIA A100 with CUDA toolkit
    • AMD MI250X with ROCm stack
  3. Compares Outputs
    • Element-wise comparison
    • ULP measurement for floating-point differences
    • Relative error checks against tolerance thresholds
  4. Classifies Differences
    • Hardware-specific rounding
    • Math library implementation differences
    • Precision truncation or extension
    • Algorithm choice (e.g., blocked vs unblocked GEMM)

📊 Key Findings


🧠 Example: Exponential Function (exp)

For large positive inputs, NVIDIA’s implementation used a fused polynomial approximation, while AMD’s used a different approximation table.
Result: up to 3 ULP difference for extreme values.


🚀 Recommendations for Developers


📌 Conclusion

Our work shows that GPU vendor choice can subtly impact numerical results.
By understanding and measuring these differences, developers can make informed decisions about portability, reproducibility, and reliability in HPC and ML applications.


📄 Read the full paper:
Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs (SC24-W)

💻 Source code (if available):
GitHub Repository (update if separate repo is used)