This is the second in a series of three blog posts detailing the use of SAW and Cryptol to prove the correctness of the HMAC implementation in Amazon’s s2n TLS library. Part one: Verifying s2n HMAC with SAW. Part three: Proving Program Equivalence with SAW.
In the first post, we described how we proved equivalence between a mathematical description of the HMAC algorithm:
HMAC(K, m) = H((K' ⊕ opad) || H ((K' ⊕ ipad) || m))
and the s2n C implementation of the HMAC algorithm, which is written in 166 lines of C code. The first step in this process was to write a Cryptol specification that we can visually trust defines the same thing as the mathematical definition.
Despite the fancy name, specifications aren’t a fancy or novel idea. In fact, our ancient ancestors used specifications. A parent caveman would specify a tiger to his kids “A large orange and black cat is a tiger. It tastes bad and will eat you. Run when you see it.” In this case, the specification is “large orange and black cat.” It doesn’t tell you how to make a tiger, and frankly you don’t care (or even want to know). As an ancient human, all you really need to know is how to identify a tiger, so that you can appropriately run from it instead of trying to hunt it.
Now we don’t see many tigers, and modern parents say “Read this Cryptol spec for HMAC. A bad implementation behaves differently. Those implementations taste bad and will eat you. Run when you see them.” As an added benefit, those parents no longer get asked to read bedtime stories, leaving them free to prove more programs correct.
Humans like to use language to convey their specifications. This works well when humans need to be able to both understand the specification and check against it. In the case of verification, the specifications have double duty. They need to be understandable by both humans and computers, so that computers can do the checking.
Cryptol is a language that was written for precisely this purpose. It comes as no surprise, then, that it is generally less work to verify equivalence between two Cryptol programs than a Cryptol program and a C program, or two C programs. We can leverage the fact that program equivalence is a transitive property to perform the proof in layers, so that we can do as much work using only Cryptol specifications as possible.
To illustrate this, we can think of the proof between the high level Cryptol spec and the C code as a distance that we need to traverse. Now imagine that we have two cars. One is really fast, but only runs on smooth roads (Cryptol to Cryptol equivalence). The other is much slower, but is capable of reaching all of the way to our end goal (equivalence involving C). In general, we would like to take the fast car as far as we can (as long as we don’t have to go too far out of our way to keep taking it) before switching over to the all-terrain vehicle for the last leg. This blog post is about choosing the point where we should switch cars.
We listed the high-level Cryptol specification in the previous blog post, but we will do it again here in order to examine the type signatures and learn more about Cryptol:
hmac : { msgBytes, pwBytes, digest, blockLength }
( fin pwBytes, fin digest, fin blockLength )
=> ([blockLength + msgBytes][8] -> [8*digest])
-> ([blockLength + digest][8] -> [8*digest])
-> ([pwBytes][8] -> [8*digest])
-> [pwBytes][8]
-> [msgBytes][8]
-> [digest*8]
hmac hash hash2 hash3 key message = hash2 (okey # internal)
where
ks : [blockLength][8]
ks = kinit hash3 key
okey = [k ^ 0x5C | k <- ks]
ikey = [k ^ 0x36 | k <- ks]
internal = split (hash (ikey # message))
In the first line, we declare a number of type variables. These variables are numeric values, but types are allowed to depend on them. Next, we list constraints on those variables. In this case we say that some of the variables are finite (fin), but we can do more interesting things as well.
For example, if we hated security, we could add in 3 >= pwBytes. Then if a security zealot with a 4 character password (“1234” instead of our typical “123”) tried to call our function they’d get the message:
We should note that this is a static check. Like all of your favorite typechecking properties, this check occurs when the Cryptol source is loaded, not when it is executed. This means that it is also impossible for another function to use our new, small key only, version of HMAC unless it can guarantee that the argument it supplies to the HMAC function is always shorter than 3 bytes.
The type variables are also values, however, so we can do computation on them as well. For example we could write a function that checks to see if 3 >= pwBytes and returns the message “chill out please, your key is too long” instead of running HMAC if the key is too long. This code would interpret fine, and only complain when it was actually called with a long key.
We, of course, love security, and authentication of our “DARKEST SECRETS” is certainly not protected by the password “123”. We just had IT add the “4”.
With the exception of that type fanciness, Cryptol mostly behaves like the functional languages you already know and love (you do love them right??).
So far, we’ve talked about HMAC as if it is a single function that takes a message and a key and outputs an authentication code. In reality, this can be terribly inefficient. The message could become available gradually, and waste valuable processing time as it arrives. To help with this problem, the C code offers an initialize/update/digest interface, which allows the authentication code to be constructed incrementally as the message arrives.
To prove that the C code is equivalent to the highest level spec, we would like to make a lower level Cryptol specification that makes two major steps in the direction of the C code.
In order to model the state, we declare a record in Cryptol that matches the struct that is declared in the C code.
So in the C code we see:
struct s2n_hmac_state {
s2n_hmac_algorithm alg;
uint16_t hash_block_size;
uint32_t currently_in_hash_block;
uint16_t block_size;
uint8_t digest_size;
…
}
And in Cryptol our state is
type HMACState = {
alg : [32]
, hash_block_size : [16]
, currently_in_hash_block : [32]
, block_size : [16]
, digest_size : [8]
…
}
Each of the types (which occur after the colons) specify the size of the input in bits. Now all of our HMAC functions will be transformations over a HMACState. For example the hmac_init function:
hmac_init : { pwBytes }
( fin pwBytes, 64 >= width (8*pwBytes) )
=> [pwBytes][8]
-> HMACState
hmac_init key =
…
takes a key (of size ([pwBytes][8]), since size specification is in bits and there are 8 bits in a byte, you can think of this as pwBytes bytes) as an argument and creates a new HMACState. The update function takes a HMAC state and (part of) a message and returns an updated HMACState. The digest function takes the HMACState and returns a finalized HMACState and an output whose size matches the output of the hash function being used.
One of the most interesting parts of verifying HMAC is that internally HMAC can use a wide variety of hash functions. The HMAC state carries around a number of states for these hash functions, but the HMAC algorithm isn’t very concerned with the states and simply assumes that they interact properly with the provided hash functions. It can’t really care about the result of the hash, and if it could, we probably wouldn’t want to use that hash.
For the purpose of specification, then, it really doesn’t matter what our Cryptol code does once it reaches the call to the hash function because our verification won’t care. We will discuss how we achieve this and why in the next post. What we might want to do, however, is execute our specification on real inputs in order to further convince ourselves of its correctness. Furthermore, if our verification turns up a counterexample that might cause the specification to differ from the implementation, we would like to be able to actually run that counterexample.
In order to allow the hash function to be swapped out easily, we define intermediate functions (we leave out their definitions, since these functions simply rename another function):
hash_init_c_state : SHA512_c_state -> SHA512_c_state
hash_update_c_state : {msg_size} (fin msg_size) => SHA512_c_state -> [msg_size][8] ->
SHA512_c_state
hash_digest_c_state : {digest_size} (fin digest_size) => SHA512_c_state -> [digest_size][8]
These functions allow us to easily swap out the hash implementation when it is convenient. We could also define all of these functions to be undefined a value which causes an error if it is evaluated. Even though the evaluation would cause an error, during verification we will always stop evaluation at the point of those functions, meaning that a function that can’t be evaluated might still represent a valid specification.
If these functions are generic why do we see mention of SHA512 in the types? To allow our function to be generic despite needing different states, we took a page from the C book. In C, a union of types is statically given the size of the largest type in the union. In the case of the hashes available in s2n, this is SHA512. Because it is the biggest state, any of the other states can fit inside of it. In order to define a function with the type required by our hash_c_state functions, we need a function to translate e.g. a SHA512 state that contains a SHA256 state into just a SHA256 state so the SHA256 Cryptol implementation can do its work. Once that work is done, the state can be put back into the SHA512 state.
We made the state and interface in the Cryptol specification match those in the C program, but there are still a few places that things don’t match as closely.
As an example, s2n has a statement specifically designed to avoid a timing attack on x86 processors:
state->currently_in_hash_block += (4294949760 + size) % state->hash_block_size;
As the comment in the source code states, the constant 4294949760 is a multiple of all possible values of state->hash_block_size and size is always small enough that there will never be any overflow. That means that this code should be functionally equivalent to the code that doesn’t add in the large constant. The constant is actually there because on many architecture the modulo operator behaves significantly more efficiently on a small input. That means that timing the update function could leak information about the length of the message. Ensuring that the argument to the modulo operator is always large avoids this timing attack.
The current Cryptol code is:
currently_in_hash_block = (s.currently_in_hash_block + ((128000 + `n) % (zero # s.hash_block_size)))
Which has a different constant, that is also functionally equivalent to not being there. The constant we used matched the C code when we started development, but the C code’s constant was increased to improve timing attack resistance. The update did not stop the proofs of correctness from working, so we did not modify our specification. And since the top-level specification (the one line of math at the top of this post) does not include either of these constants, our proof tells us that they do not have an effect on the input / output behavior of the hmac computation.
In this post, we have given two definitions of what it means to be HMAC. The first is meant to be visibly comparable to the mathematical specification that exists in standards and the second is meant to be similar to the structure of the C code. In the next post, we will show how we prove equivalence between the two Cryptol specs, and the incremental Cryptol spec and the C code.
Yeah, this is a blog post, but it’s still going to tell you what to do… After this you’ll still want to come back and read the next one right? Anyways, it’s a fun homework; you get to ask the same question over and over.
This is how a good verification researcher behaves. The question to ask is “why should I trust (or believe in) this?” So if someone comes to you with a tool and claims that it finds all undefined behavior in a C program, an interested researcher might ask “why should I trust your tool?”. Often the tool developer will give some argument about soundness. They might give this as if it is the whole answer, but don’t forget that you’re tasked with asking the same question over and over again. “Why should I trust that your soundness argument’s representation of the C language corresponds to what your compiler does?”
Perhaps the tool also depends on a library that helps generate side conditions .“Why should I trust the library?”. Perhaps the library has a soundness proof of its own and you can ask more questions.
Eventually, while asking “why should I trust this?” you will hit a point at which you can go no further. Ideally it will be some sort of axioms that are simple enough for you to verify visually, but this is rarely, if ever, the case. In reality, it takes a very long time to provide enough evidence that you only need to trust a few things to be true, and getting trust to this level often greatly slows further development of software as well. So developers of verification tools always walk a fine line between answering all of the questions, and actually getting their software developed and making it useful to their customers.
Your homework, then, is to ask “why should I trust this?” about the proofs we are presenting here as many times as possible. Write down a list of exactly what you need to believe in order to trust the SAW verification. Next to that write down the same list of things for s2n all on its own. Hopefully this will illuminate where SAW is helping with the trustworthiness of the code, and where we can apply future work to help make SAW more trustworthy. Our answers will be in the next post, we look forward to seeing yours!