At Galois, we are interested in expanding the capabilities of privacy-preserving technologies, as we believe such technology will play a vital role in our future privacy-sensitive world. One such technology that we’ve been exploring is Homomorphic Encryption (HE), a cryptographic mechanism that allows someone to perform computation on encrypted data. In a previous project, we’ve explored HE from a usability angle, and we are currently working on a project exploring HE from a hardware-performance angle. As part of these, we’ve played with several HE libraries to understand their capabilities and limitations. A common thread across these libraries is that, while they are extremely powerful, writing performant HE programs is a highly non-trivial task. That is, implementing an HE program is sadly not as simple as taking your python program and “running it in HE.” In particular, there are many things HE programs cannot do for security reasons—for example, looping over a private value—and there are many things HE programs should not do for performance reasons. Here we’ll focus on the performance piece.
To understand why one needs to be careful writing HE programs—and how certain choices may have significant impacts on HE performance—we need to briefly describe how existing HE approaches work. All existing HE approaches take some plaintext value and add “noise” to produce a resulting ciphertext. This “noise” is added in such a way that it can only be removed given the secret key (a.k.a. decryption) even after ciphertexts are manipulated (i.e., added or multiplied together). To recap, encryption involves adding noise, computation can be done on these noisy (a.k.a. encrypted) values, and decryption involves removing this noise to return the manipulated plaintext value.
While the above presents a high-level abstraction of how HE works, it’s sadly not as simple in practice. Every operation conducted on ciphertexts increases the noise, and some operations (like multiplication) produce a lot more noise than other operations (like addition). Also, and most importantly, there is a “noise cap,” at which point decryption no longer becomes possible—the ciphertext is just too noisy. This noise cap is constrained by complex parameters related to security and performance, and performance degrades rapidly as one increases the noise cap. So keeping this noise cap as small as possible is essential; however, it can’t be so small that the computation overflows the cap.
This is where noise estimation comes into play. Ideally, we could take a computation, determine how much noise it would produce in HE, and set the noise cap accordingly. Sadly, it is not that simple. This is because it is really hard to figure out exactly how much noise an operation adds. Fortunately, researchers have derived worst case bounds on the noise of various operations. So one approach would be to bound the noise of each operation and from there compute the overall noise of the program. However, this incurs a potentially significant overestimate: the worst case noise of the overall program may be much lower than the sum of the worst case noise of each operation.
This leads us to an observation: What if, instead of tracking the worst case noise bounds at an operation level, we track the worst case noise bound at a program level?
A strawman approach would be to run the program in HE and simply observe how the noise grows. Leaving aside the inefficiency of the approach, we can’t even run the program in HE because we haven’t yet decided on its parameters. That’s why we are doing noise estimation in the first place!
What we do instead is to symbolically evaluate the program, keeping track of how the noise grows. The evaluation computes the final noise as a polynomial whose variables are the original noise, plaintext, and encoding parameters. Finally, we can use the Monte Carlo method to sample the noise and the plaintext. That is, we repeatedly sample those variables to approximate the worst-case final noise in terms of the initial parameters. From there, it is easy to find the correct parameters not to overflow the “noise cap.”
An advantage of this approach is that it makes it easy to take advantage of public knowledge about the plaintext. For example, if an input is known to be a boolean (e.g., by type inference), we only need to sample that plaintext from 0 or 1. Using such plaintext bounds, we can significantly reduce the worst-case noise bound. To the best of our knowledge, this technique has not been used before.
Our work is only a proof-of-concept, and there is a lot more work to be done to make HE-MAN usable more broadly. We’ve thus far targeted Microsoft Research’s SEAL HE library, and supporting several of the other libraries—such as IBM’s HELib or Duality’s PALISADE library—would bring these capabilities to a larger audience. We’ve also focused on feasibility and not performance: because we are symbolically evaluating the HE program, things get out-of-hand quickly as the program grows. Fortunately, there are several known approaches to improving this which we’d like to integrate.