We are excited to introduce Ariel Kellison, one of Galois’s newest Research Engineers! Ariel recently won the Frederick A. Howes Scholar in Computational Science Award, which honors exceptional leadership, character, and technical achievement in the field of computational science.
In the article below, Ariel digs into her fascinating work at the intersection of formal methods and scientific computing – revealing how rigorous verification techniques can unlock greater confidence and correctness in large-scale simulations, from climate models to public health interventions and beyond.
You’re probably familiar with formal methods in the context of high-assurance software development, cryptography, and model-based systems engineering. In these domains, formal methods are used to rigorously verify that systems behave as intended, eliminating entire classes of bugs through mathematical proofs. But what you might not realize is that formal methods techniques are increasingly relevant in scientific computing—an area often viewed as too empirical, too large-scale, or too focused on performance for formal verification.
Scientific computing systems are no less critical than the software that controls aircraft or secures digital communications. They drive climate models that inform carbon policy and disaster planning, guide public health interventions, underpin efforts to modernize the U.S. power grid, and support national security decisions related to the stewardship of our nuclear weapons stockpile. These applications run on some of the world’s most powerful high-performance computing (HPC) systems, often involving millions of lines of code and complex interactions between software, hardware, and data. Mistakes in these systems aren’t just bugs; they can lead to wasted resources, flawed scientific conclusions, and a loss of public trust in science. As models grow in scale and importance, so too does the need to rigorously ensure their correctness.
These simulations aren’t just academic exercises. They are increasingly used to shape the world around us: how we prepare for the next hurricane, where infrastructure investments go, when we issue public health warnings, and even how we understand the universe. When answers matter this much, we need to be confident they’re right. And that’s exactly the kind of confidence that formal methods can provide.
Here’s the issue: these complex models and simulations require a vast number of simple arithmetic computations, which can now be performed at a rate of over a billion billion calculations per second. But even simple operations like addition and subtraction come with a catch: computers can’t represent most real numbers exactly. Real numbers are approximated using a fixed number of bits, typically as floating-point numbers. Nearly every floating-point calculation must be rounded to fit within the machine’s limited precision. While each rounding error may be tiny, their effects accumulate across trillions of steps, potentially distorting the final results in subtle but meaningful ways. And when those results shape real-world decisions the cost of getting the math wrong can be enormous.
This isn’t just a theoretical concern. In the ATLAS experiment at the Large Hadron Collider which contributed to the discovery of the Higgs boson, small numerical inconsistencies led to missed or misidentified collisions. In another case, scientists working with an atmospheric component of a large-scale climate model found that results diverged dramatically from benchmark outputs after only a few days of simulated time. Even minor changes, such as running the model on a different number of processors, made it impossible to tell whether the discrepancies were caused by inherent chaos in the system or by avoidable numerical errors. In both cases, researchers lost confidence in their tools, productivity was disrupted, and scientific progress stalled because of seemingly small numerical issues hidden deep inside massive computations.
At this scale, testing and debugging are not only technically difficult, they’re also extremely costly. Time on supercomputers is limited and expensive. Running a full-scale simulation just to uncover a bug can take days. And diagnosing the root cause of a divergence across millions of operations and thousands of parallel threads is time-consuming and nearly intractable without the right tools.
These issues raise a natural question: can we catch subtle numerical errors before running these expensive simulations? My research shows that we can—using a static, type-based approach to rounding error analysis that provides sound guarantees about how errors accumulate.
We’ve built programming languages with specialized type systems that can reason about numerical accuracy and precision. Just as some type systems like Rust’s ensure memory safety or track ownership of data, our type system tracks how rounding errors propagate through a computation. This approach draws on techniques from sensitivity analysis and linear type systems, enabling the language to automatically reason about the accumulation of roundoff error without needing to execute the program.
This static approach is especially attractive in high-performance scientific computing, where simulations are expensive to run and the cost of a bug can be enormous. With our type system, programmers can write code with built-in guarantees: the program either passes the type checker, ensuring bounded rounding error, or is rejected before it ever runs. This offers a form of practical numerical assurance, giving researchers confidence that their computations are not just fast, but also trustworthy.
Ultimately, my work helps scientists model with greater confidence by making bounds on rounding error an explicit, checkable property of the code itself. It enables the construction of high-assurance computational building blocks, so that as models scale, trust in their conclusions scales with them.
Looking ahead, this approach has the potential to reshape how we write scientific software. As models and simulations grow even more complex and are deployed in real-time, safety-critical, or policy-making contexts, we’ll need verification tools that scale with the science. We’re especially interested in future directions that integrate versions of these type systems into mainstream scientific workflows and programming languages. Ultimately, the goal isn’t just to catch more bugs, it’s to make correctness and trust a priority in scientific computing, not secondary to performance.