July 8, 2025

Automatically Detecting Numerical Instability in ML via Soft Assertions

Machine learning (ML) models run on massive datasets and often perform billions of floating-point calculations. But here’s the problem: small numerical errors can snowball into completely wrong predictions — and sometimes, you won’t even see a NaN or an error message. This is numerical instability, and it’s sneaky.

Machine learning (ML) models run on massive datasets and often perform billions of floating-point calculations.
But here’s the problem: small numerical errors can snowball into completely wrong predictions — and sometimes, you won’t even see a NaN or an error message.
This is numerical instability, and it’s sneaky.

In our FSE 2025 paper, we introduce Soft Assertions — a new approach to detect and trigger these hidden bugs automatically.


🔍 Why Numerical Instability Matters

Numerical bugs in ML can:

Example:
In one case study, a brain tumor detection model trained on MRI scans predicted “no tumor” when a tumor was clearly present — all because of an unstable function deep inside a neural network layer.


💡 What Are Soft Assertions?

Think of soft assertions as learned runtime guards for ML code.
Instead of writing a rigid assert statement, we train a small ML model to recognize “dangerous” input values for certain functions — like cosine_similarity, exp, or matmul.

When the program runs:

  1. The soft assertion checks the inputs at a numerically unstable function.
  2. If instability is likely, it tells the fuzzer how to mutate the inputs to trigger the bug.

This is different from:


🛠️ How It Works

  1. Build a database of unstable ML functions (we found 61 in PyTorch & TensorFlow).
  2. Design oracles to decide if a function’s output is wrong (NaN/INF, out of range, wrong value, etc.).
  3. Generate training data by running unit tests on these functions.
  4. Train a classifier (we used Random Forest) to predict:
    • increase → increasing the value will cause instability
    • decrease → decreasing will cause instability
    • no change → this input already causes instability
  5. Guide fuzzing using these predictions to reach instability faster.

📊 Key Results

We tested our Soft Assertion Fuzzer on:

Findings:


🧠 Case Study: Tumor Detection Model

Before fix:
Confidence in “no tumor” = 0.4990
After fix:
Confidence in “tumor” = 0.8874

Brain MRI before and after fix
Figure: Left — before fix, model incorrectly predicted “no tumor”; Right — after fix, correct “tumor” prediction.


🚀 Why This Matters

Soft Assertions bridge a gap between:

They let developers:


📄 Read the full paper:
Automatically Detecting Numerical Instability in ML via Soft Assertions (FSE 2025)
💻 Source code:
Soft Assertion Fuzzer on GitHub