Measuring the Privacy of Computations

Secure computation enables users to compute some result without revealing the inputs. Privacy schemes that are shown to only reveal outputs are said to have input privacy. However, learning these outputs still tells you something about the private inputs. The important question is: "how much?

The Defense Advanced Research Projects Agency (DARPA) Brandeis program aims to enable safe and predictable sharing of data while preserving privacy. On this program, the Galois-led TAMBA team's focus has been to develop and apply new approaches to evaluating privacy-preserving systems. One tool we developed is luigi-qif, which can quantify how much an adversary who can observe (only) the output of a secure computation  can infer about the private data used in that computation.

luigi-qif evaluates privacy leakage of secure multi-party computation programs written in the popular MPC framework SCALE-MAMBA.luigi-qif is an analysis tool and doesn't perform the secure computation itself. Instead, luigi-qif takes the description of an MPC program (as unmodified Python SCALE-MAMBA code) as input, along with information about the known bounds of possible private data, and produces two metrics:

1. static leakage: an 'expected' level of leakage, i.e. an estimate of how much could leak without knowledge of what the program’s real output would be

2. dynamic leakage: the 'real' leakage associated with one specific output from the computation

These are both well-studied metrics in the field of Quantitative Information Flow (please see this tutorial for more background). If an adversary is trying to ‘guess’ the private data’s values, the leakage *values* provide a mathematical notion to how much an adversary is aided in their guess by seeing the output of the computation.

Using the Tool

If you and your friends each privately submit a number between 0 and 4 to a secure computation that prints only the sum of all your numbers, how much does seeing the result help someone guess what number you submitted?

We use luigi-qif to answer this question by giving the following inputs:

  • The unmodified SCALE-MAMBA code for the (secure sum) computation
  • A file describing the bounds of possible inputs

Let's quickly go over an example of each.

Our input program

Our tool analyzes unmodified SCALE-MAMBA programs, like the following which  performs a simple multi-party computation: take one input from each of 10 parties, and sum them together.

numbers = [sint.get_private_input_from(i) for i in range(10)]
acu = sint(0)
for x in numbers:
   acu += x
print_ln("RESULT: %s", acu.reveal())

Bounds on the possible inputs

Most computations have some bounds on the possible values that are valid inputs. For example, an unsigned 16-bit integer must at least be between 0 and 2^16-1, but in many cases the range of valid inputs are much narrower. Consider a private input known to be in the range between 0 and 100. If the adversary knows the inputs are between 0 and 100 then they'll never guess anything outside that range. If the computation reveals "the input was less than 500" doesn't help the adversary guess either since they already know it's between 0 and 100. Users of luigi-qif model this information about valid input values by providing a "priors" input file to accompany the associated MPC file.

Here's the priors file for our example computation that sums the 10 private values between 0 and 4. There is one line for each private variable, and each line contains the lower and upper bound for that value.

0 4
0 4
0 4
0 4
0 4
0 4
0 4
0 4
0 4
0 4

Running luigi-qif

With both our SCALE-MAMBA code and our .priors file, we can run luigi-qif to help us reason about how much can be inferred about our private data if we take part in this computation (i.e. about the potential leakage of the computation).

As mentioned in the Introduction, luigi-qif can provide two metrics, let's take a look at both, first static leakage:

>> luigi-qif --priors examples/sum.priors --mpc examples/sum.mpc static-leakage

Number of possible outputs: 41
STATIC LEAKAGE: 5.357552 bits out of 23.219281 bits total

And now dynamic leakage:

>> luigi-qif --priors examples/sum.priors --mpc examples/sum.mpc dynamic-leakage 8

Prior vulnerability: 1/9765625
Posterior vulnerability: 1/22110
Leakage: 8.786870 bits out of 23.219281 bits

Note that for dynamic-leakage we used the output value of 8. We can plot the dynamic leakage of each possible output for this program to see how the observed output relates to how much leakage would occur (this is possible for this program, but infeasible for programs with a large output space!):

Seeing the results shown in the Figure above helps us reinforce some of our intuitions. There are many possible ways for all the values to sum up to 20, i.e. low leakage. There is only way to sum to 0, i.e. high leakage. This is reflected in the results for those points on the x-axis. 

Use of leakage analysis in a realistic scenario

luigi-qif is capable of analyzing scenarios much more complex than the simple "sum" example above, where the privacy impact of sharing the result is less obvious. On the Brandeis program, the Enterprise Collaborative Research Team developed a scenario for Pacific-bordering nations' collaborative response to natural disasters in the Pacific. While many countries are willing to cooperate as part of emergency response, the optimal response depends on potentially sensitive data about the participants’ assets and capabilities. 

Coalition nations would like to cooperate in response to the disaster while ensuring that certain sensitive details of their capabilities remain private. This situation exemplifies the classic conundrum at the heart of privacy preserving technologies: the trade-off between privacy and utility. luigi-qif can help quantify these trade-offs and find a middle ground.

The Scenario

The goal of this scenario is for nations to determine an aid-distribution schedule to allocate their shared resources in a way that best responds to the disaster. This is a complex scenario where each participant has several types of private information (ship locations and capabilities), and the scheduling optimization process to allocate ships to ports involves many constraints (not all allocations are feasible). It is not obvious how much an adversary could infer about the private data from the resulting aid distribution schedule, due to the many constraints and types of information involved in this operation.

luigi-qif can fully model the schedule optimization algorithm, which will be performed in secure multiparty computation, and then reason about exactly how much the schedule can reveal about each participants' private information. This enables potential participants to make a decision about whether to participate in the computation that has been informed by a measure of potential privacy loss.

The Data

You can divide the actors in the scenario into two main groups: Aid providers and aid recipients. In the scenario, each of these has data that they consider private. For aid providers, this data corresponds to information about their ships:

  • Location
  • Length (size)
  • Draft (how deep the ship sits in the water)
  • Maximum speed

For aid recipients this data corresponds to information about the ports, generally, and specific berths at each port. The table below shows the data that is relevant to the port as a whole and to each berth individually:

Port informationBerth InformationAvailable?LengthOffload capacityOccupied start timeHarbor depthOccupied end time

As you can see the scenario not only has more data than our sum example, but the various dimensions of the data interact in non-trivial ways. Perhaps a ship is close enough to a port, and is capable of a speed that would reach that port, but the port harbour depth is too shallow. There are several constraints on the ability of a ship to dock at a specific berth at a specific time, including the position of other ships! This confluence of factors makes it more difficult to reason about what an adversary might learn about the private data from observing the final resource allocation solution.

As with our sum example above, all of these data fields are modelled in luigi-qif using a prior file that sets domain specific bounds on the possible values in our scenario.

If you are interested in more details on the scenario itself and other approaches to leakage analysis for this scenario, we co-wrote a paper for the 2019 Scheduling and Planning Applications Workshop with our colleagues at SRI.

Results of Running the Analysis

The TAMBA team has taken the resource allocation algorithm, written in SCALE-MAMBA and used luigi-qif to provide insight about what information can be inferred about the data provided by coalitions members.

As an information-theoretic measure, the results below are in 'bits'. You can think of each additional leaked bit as halving the possible number of guesses that an adversary might take. With that in mind, for any nation taking part in the coalition scenario, luigi-qif reports that the static leakage would be 9.1 bits of information. luigi-qif also reports this was out of 148.75 initial bits in the state space.

Alone knowing that an algorithm leaks 9.1 bits may not be actionable information. Indeed, luigi-qif comes into its own when used to compare the leakages of various approaches. We used luigi-qif to analyze various possible resource allocation strategies, giving analysts and decision makers the ability to quantify the privacy/utility tradeoff when making their decisions (please see our paper for more detail).

Furthermore, ‘bits’ may not be the most intuitive metric. Depending on the domain in question, you can use the results to display the information in a more intuitive way. Below we show a map where red indicates the possible locations of a ship from our scenario after the scheduling algorithm is run. Here you can more easily visualize what the loss of privacy corresponds to in the physical world:

Conclusion

Metrics and visualizations like the above provide insights and intuitions for policy makers and system analysts that they can then use to inform their decisions. In particular, tools like luigi-qif can add weight to the argument for deploying systems that use MPC, as the privacy/utility tradeoff can be made more explicit and avoid the pitfalls of 'all or nothing' collaboration between parties that may not fully trust each other with their secret data.

Building tools like luigi-qif is one way Galois works towards its goal of ensuring systems do _only_ what they are intended to do. In using MPC, it can be easy to think that all the inputs are private, they stay private after the output is revealed. Providing tools for reasoning about the degree to which that may or may not be true helps ensure that private data stays private, and that implementers reckon with the leakage that does exist.

Acknowledgments

This material is based upon work supported by the United States Air Force and DARPA under Contract No FA8750-16-C-0022. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA.

DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited.