Galois’ mission is improving the trustworthiness of critical systems. Trustworthiness is an inherent property of a system, but we need to produce evidence of its trustworthiness in order for people to make informed decisions. The evidence, and its presentation is a key part of what is often called an assurance case.
The kinds of evidence that make up an assurance case can differ, depending on how critical the system is. The more critical, the more thorough and convincing the assurance case should be. Formal methods — mathematical proofs of software correctness — are often called for when evaluating the most critical systems. A formal method approach to assurance produces a strong argument about a model of the system. In contrast, testing produces a weak argument about the system itself. The reason it is a weak argument is that for any non-trivial component, testing will only exercise a minuscule percent of the possible input space. In contrast, a formal proof says that for a model of the system, some property holds for all inputs. Phrased this way, it’s clear that testing and formal evidence provide complementary evidence of a system’s correctness.
In this post we describe a fast, formally verified C implementation of Base64 encoding. The 10,000 foot description of the proof approach is:
The code and proof are available here:
https://github.com/davidlazar/base64
The following sections describe the different parts of this repository.
The C code in base64encode.c
is a fast implementation of Base64 encoding. It is based on libb64, which uses coroutines to achieve speed. The nontrivial control-flow of this code makes proving it correct more challenging. This is the artifact we want to develop assurance evidence for.
Typing “make” at the top of our repository builds the b64enc
tool, a simple frontend to the C code. Here is a quick timing comparison between our code and the base64
utility that is part of GNU Coreutils:
$ time b64enc 500MB.random > x.b64
real: 1.197s user: 0.61s cpu: 0.48s
$ time base64 -w0 500MB.random > y.b64
real: 2.615s user: 1.81s cpu: 0.62s
$ diff -w x.b64 y.b64
$
The Cryptol specification in base64.cry
is based on RFC 4648. Ignoring padding and assuming Cryptol is in big-endian mode, the encoding routine is just two lines:
encode : {a c} (fin a, c == (8 * a + 5) / 6, 6 * c >= 8 * a) =>
[a][8] -> [c][8];
encode xs = [| alphabet @ x || x <- groupBy(6, join xs # zero) |];
The corresponding decode
function is similarly short. The b64encode
and b64decode
functions are wrappers around encode
and decode
that handle padding.
Our specification provides the following correctness theorem, polymorphic over all lengths of x
:
theorem b64encodeLeftInv: {x}. b64decode (b64encode x) == x;
Cryptol can’t prove polymorphic theorems on its own, so we must monomorphize the theorem to prove it:
base64> :prove (b64encodeLeftInv : [16][8] -> Bit)
Q.E.D.
base64> :prove (b64encodeLeftInv : [200][8] -> Bit)
Q.E.D.
This proof/
subdirectory contains infrastructure for proving the C code correct. The Makefile in this directory orchestrates the proof. Typing make n=16
generates and proves a theorem that says the base64_encode
function from the C code (as compiled by Clang) is equivalent to the b64encode
function in the Cryptol specification for the given input length n
:
$ make n=16
Proving function equivalent to reference:
encode_aig : [16][8] -> [24][8]
Q.E.D.
Let’s see what the Makefile is doing behind the scenes.
The C code in sym_encode.c
is a wrapper around our C code that passes a symbolic value to the base64_encode
function. This code is compiled by Clang to LLVM bytecode.
The LLVM Symbolic Simulator (a prototype tool currently under development by Galois) is used to extract a formal model encode_aig
of the LLVM bytecode. The model can be loaded into Cryptol and used like any other function:
proof> :t encode_aig
encode_aig : [16][8] -> [24][8]
proof> encode_aig "16 characters..."
"MTYgY2hhcmFjdGVycy4uLg=="
In particular, we can write a theorem about this function:
theorem MatchesRef : {x}. encode_aig x == b64encode x;
…and then prove it:
proof> :prove MatchesRef
Q.E.D.
Success! Amazingly, this proof systems scales to large values of n
where exhaustive checking is not feasible:
$ time make n=1000
Proving function equivalent to reference:
encode_aig : [1000][8] -> [1336][8]
Q.E.D.
real: 17.882s user: 16.31s cpu: 1.50s
The success of the proof gives us high confidence that our C code is correct.
To reiterate our steps:
The weak-link of many approaches to using formal methods applied to the software correctness challenge is the model-to-code correspondence. The approach we took above addresses that weakness by automatically generating the model via symbolic simulation of a low-level representation of the program — in this case, the LLVM bytecode. This approach would miss bugs or malware in the path from LLVM to executable, but remains a compelling argument for the correctness of the C code. In critical applications, it makes sense to include the compiler in the scope of an overall assurance case.