Automatically Detecting Numerical Instability in Machine Learning Applications via Soft Assertions
FSE 2025 paper introducing Soft Assertions to detect and trigger numerical instability bugs in ML applications.
Anwar Zahid
I study how AI and numerical software fail in practice, then build testing and debugging tools that make those failures easier to detect, reproduce, and fix.
I am a Ph.D. student in Computer Science at Iowa State University, advised by Prof. Wei Le in the Program Analysis and AI Lab. My research focuses on reliable AI systems, numerical debugging, and software engineering techniques for machine learning systems.
Before starting my Ph.D., I worked as a software engineer on government, banking, AI, and mobile platforms. That industry background shapes how I approach research: I care about methods that can become practical tools for developers and researchers.
Selected projects from existing repository entries.
FSE 2025 paper introducing Soft Assertions to detect and trigger numerical instability bugs in ML applications.
Contributed to the LLNL Varity project by implementing HIP backend generation for GPU kernel testing, enabling cross-platform numerical consistency evaluation.
Extended a class project to evaluate hate speech detection models using geographical metadata. Later adapted for large language model testing and published in 2025.
Developed a real-time face recognition and spoof detection system for FinTech applications, enhancing security for remote banking verification.
Implemented Single Sign-On authentication for PRP system, enabling secure unified access across parliamentary resource modules.
Implemented and evaluated ML algorithms for natural language processing tasks. Later extended for LLM experiments and paper publication.
Developed an intelligent Othello game-playing agent using adversarial search algorithms and heuristic evaluation functions.
Created a Mancala game engine with an AI opponent using minimax and heuristic strategies.
Built a ray tracing engine from scratch to render 3D scenes with reflection, refraction, and shading.
Extended the Nachos instructional OS to implement thread scheduling, virtual memory, and file system operations.
Implemented and tested channel equalization algorithms to improve signal quality in noisy communication environments.
Peer-reviewed and preprint work on ML reliability, LLM evaluation, and numerical correctness.
Introduces Soft Assertions, a method for detecting and triggering hidden numerical instability bugs in machine learning applications.
Evaluates how large language models perform on hate speech detection when geographic and social context are included.
Studies numerical differences between NVIDIA and AMD GPU executions and their implications for reproducibility and portability.
Presents a conceptual software platform for virtual internship delivery and software development skill benchmarking.
Short research notes and engineering write-ups.
I want this blog to be a working notebook for the problems I keep returning to: machine learning reliability, numerical instability, debugging, and the engineering dec...
Machine learning (ML) models run on massive datasets and often perform billions of floating-point calculations. But here’s the problem: small numerical errors can snow...
When you run the same GPU program on an NVIDIA GPU and an AMD GPU, you might expect identical results. Surprisingly, that’s not always the case — even small floating-p...
The easiest way to reach me is by email. You can also find my code and academic profiles through the links below.