After four and a half years of work on the DARPA I2O Brandeis Program, we are excited to announce the completion of Jana, a project which set out to develop accessible privacy-preserving data as a service (PDaaS) to protect the privacy of data subjects while retaining data utility to users.
The Galois-led Jana project aimed to demonstrate how interoperable, open-source cryptographic algorithms, secure computation, relational databases, and privacy-preserving analytics can be combined to provide practical solutions that translate to real-world frameworks. We are thrilled to share some of the Jana project contributions, including technical innovations and real-world applications.
As the world’s capacity and effort to collect, store, and mine data continues to grow exponentially, organizations have come to rely on access to that data as a fundamental service on which their business models rely - Data as a Service (DaaS). Unfortunately, privacy assurance about that data has not kept pace. Data breaches occur by the thousands each year. Insider threats to privacy are commonplace. De-identification of data can often be reversed and has little in the way of a principled security model. Data synthesis techniques can only model correlations across data attributes for unrealistically low-dimensional schemas.
Jana was inspired by the challenge of developing capabilities to prevent unintended privacy gaps while maintaining utility features important to relational database users. Our approach to achieving these results includes end-to-end data encryption, even during computation, formal methods to analyze privacy leakage, and differential privacy to mitigate that leakage.
We tackled a lot throughout the Jana project, but we're especially proud of a few stand-out things about the Jana PDaaS system. Jana
Jana confirmed some of the key challenges we expected when proposing this effort. Performance of queries evaluated in our linear secret sharing protocols remained disappointing, with JOIN-intensive and nested queries on realistic data running up to 10,000 times slower than the same queries without privacy protection. Generation of appropriate randomness inside our secret sharing engine for specifying differential privacy noise addition also encountered some performance constraints. The information-theoretic leakage of property-preserving encryptions was surprisingly stubborn, although novel techniques were developed by our team to address those problems. Finally, the fundamentals of differential privacy and its extension toward practical, understandable privacy budgets remained a challenge throughout the project.
The primary purpose of Jana was as a research platform to study the trade-space of privacy versus query performance for diverse database sizes. Jana fulfilled that need, but also showed its value in real-world prototype applications. One such prototype was used in a study sponsored by the Bipartisan Policy Center. This study focused on inter-agency sharing of sensitive data for developing public policy, such as resource allocation for public good.
As part of the Jana project, Galois released an open-source version of Jana for use in an upper-level class about secure computation at Columbia University. This was the first class at a major university to receive hands-on access to a secure computation platform while needing no direct support from the system’s creators. We hope that this will contribute to CS education on a broader scale in the future.
While the Jana PDaaS system was a key part of the overall Jana project, our team built prototypes and demonstrated practical performance using technologies divergent from that system. Most notably, the Jana team brought its expertise to bear on several prototypes that employed Private Set Intersection (PSI) techniques. In PSI, two or more groups willing to share data (but needing to retain the privacy of that data) conduct a protocol that reveals, to parties agreed on in advance, only the common data held by the parties (which we call the intersection of their separate data sets) or some function of that intersection.
One such prototype was a naval conflict avoidance tool suited to prevent navies conducting exercises in the same general area from accidental engagement, while keeping their exercise plans private. Another such prototype enables data sharing among government organizations to analyze longitudinal outcomes where different organizations hold diverse data about the same individuals, while preventing organizations from learning anything else about each others’ databases. Yet another prototype enables de-confliction of network resources in cyber-security settings, allowing parties to learn only how many other parties plan on utilizing the same resource.
Jana, our project’s namesake, is the two-faced Roman goddess of secrets and doorways. Some say that one of her faces looks always to the future, and one to the past. We at Galois look to the future as well: a future where organizations can analyze data contributed by all, while revealing nothing except the beneficial results of that analysis.
Today, policy and statute largely enshrine the notion that such a future is not possible. From Jana and our other continuing projects, we at Galois know different. In future projects and technology transition efforts, our secure computation team aims to show how and where the notion of computing on encrypted data can achieve those privacy goals while also providing practically performant solutions for the data mining problems of the future.
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) and Space and Naval Warfare Systems Center, Pacific (SSC Pacific) under Contract No. N66001-15-C-4070. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA or SSC Pacific.