Blog posts

2025

Automatically Detecting Numerical Instability in ML via Soft Assertions

3 minute read

Published:

Machine learning (ML) models run on massive datasets and often perform billions of floating-point calculations.
But here’s the problem: small numerical errors can snowball into completely wrong predictions — and sometimes, you won’t even see a NaN or an error message.
This is numerical instability, and it’s sneaky.

Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs

2 minute read

Published:

When you run the same GPU program on an NVIDIA GPU and an AMD GPU, you might expect identical results.
Surprisingly, that’s not always the case — even small floating-point differences can lead to divergent outcomes in high-performance computing (HPC) and machine learning workloads.