Can secure computation balance data privacy and utility?
Connecting state and local government leaders
While legitimate fears about data vulnerability have limited agencies' attempts to share information, some are turning to new cryptographic techniques to protect privacy while data is processed.
Your secret stuff just got pwned. Well, not really “just.” Data brokers, social media and plenty of other random companies on the internet have been buying, trading and letting others steal our private data for years. Worse, we’re allowing them to do it in exchange for services rendered. It says so right there in the privacy policy.
The true news is that we’re finally waking up. A 2018 Parks Associates study found that almost 40 percent of broadband households strongly believe it is impossible to keep data private from companies whose products they use, while over half of consumers strongly believe that they do not get much in return for giving up their data. Hopefully, the nation's leaders will use this newfound concern to follow and extend California’s move to protect privacy -- in ways even more meaningful than the promising 55,000-word start from European Union's General Data Protection Regulation.
The disappointing thing is that we could do so much more with private data than misuse it to sell advertising and defraud insurers. Imagine a world where corporations had the confidence to share sensitive network and cyber attack data in real-time to mitigate multi-target attacks; where pharmacogenomics could intermix genotypes and smart molecule intellectual property to prioritize and customize effective therapies while protecting the confidentiality of patients and pharma alike; where the IRS, Census Bureau, Department of Education and National Student Clearinghouse could link data to quantify the benefits and risks of college choices for students without putting collected data at risk.
One place that privacy does seem to matter is in government. By policy and statute, local and federal agencies actually do aim to assure the privacy of their constituents' personal data. Some jurisdictions go further, seeking to maintain that privacy while at the same time leveraging and cross-linking that data to learn, make decisions and deliver valuable services to those constituents. But therein lies a conundrum: at the moment, to leverage data we must first expose it across agencies, voiding their aims of privacy.
The state of information security today
Agencies that see the value in sharing data while ruthlessly guarding privacy have (we hope) already solved the problems of keeping that data secure in transit and at rest. Traditional encryption technologies such as symmetric and public key encryption, along with other thoughtful cyber hygiene, can handle these tasks well -- if they’re actually deployed.
However, until now there has been little choice but to decrypt data in order to process it. The problem is that once decrypted, that data is no longer private: insider threats and external attackers can steal it, accidents can reveal it and, because thorough deletion of data is nearly impossible, those risks extend indefinitely into the future. Legitimate fears about such data vulnerability have limited the extent to which agencies are willing to share it with other agencies or the private sector.
Several approaches are employed today to avoid such theft or inadvertent leakage. Perhaps the most well known is de-identification: the removal or obfuscation of parts of the data that are particularly sensitive, or that tie the data to individuals or organizations. However, study after study shows that de-identification doesn’t prevent re-identification. In addition, such de-identification must be done anew each time the data is used for a different purpose. De-identification can also thwart exactly the cross-dataset linking needed to generate accurate answers and prevent precise data cleaning during the analysis process. In short, de-identification doesn’t work, is expensive and destroys data utility.
Another current approach is to create synthetic versions of sensitive data -- matching statistical distributions of various data attributes -- and then sharing only the synthesized substitute. The problem here is that the synthesis process can only model distributions that are explicitly chosen and known in advance. Meaningful correlations can be lost, hiding exactly the relationships that analysts want to discover.
Even though current methods make data sharing risky, real-world examples show that agencies see value in being able to share. In the U.S. for example, juvenile justice is a popular area for data sharing. Thirty-five states have laws reaching back to the 1990s that permit data sharing in search of improved outcomes. Twenty-seven of those states share data across their child welfare and juvenile justice systems for the same purpose.
So. Sharing sensitive data offers promising value yet is risky because of the need to decrypt it for processing, and current methods fail to mitigate those risks. What to do? Some agencies are turning to new cryptographic techniques called secure computation to protect privacy while data is processed.
Secure computation is a promising alternative, though performance and usability are still being improved. Here, data is shared in full, so no utility is lost and no recurring preparation effort is needed. However, the data remains encrypted at all times -- even during computation -- and even while results are filtered by access control rules. With secure computation, input data is never “in the clear,” even if analysis platforms are hacked.
Current and potential applications
Pilot projects now underway tap secure computation for a variety of use cases involving sensitive data. In 2018, the Department of Homeland Security's Science & Technology Directorate awarded a contract for the development of a tool suite that allows sensitive cybersecurity data to be shared and analyzed while it remains fully encrypted and thus private.
In that same year, the Bipartisan Policy Center announced a pilot project to apply secure computation to evidence-based policymaking at the county level, bringing together several encrypted datasets to analyze relationships among public health and human services outcomes.
Proof-of-concept projects at the Census Bureau and in the Department of Defense also show the nascent capabilities and benefits of this new technology. More applications of secure computation are ready for pilot programs as well. Preventing satellite collisions without revealing satellite trajectory data is a promising example. Even the private sector is taking note: In Boston, the Women’s Workforce Coalition piloted secure computation in an ongoing analysis of salary differentials and fair pay.
The government has a short-fuse opportunity to combine its commitment to privacy, the emerging capability of secure computation and these nascent pilot programs to deliver new levels of evidence-based action for the public good and to push back the gathering dark of privacy obliteration in the private sector. More pilot programs are needed. We need extensive dialog and study on how to integrate such disruptive new technologies into policy and practice so that agency administrators can both comply with the statute and be comfortable with revising the policy.
NEXT STORY: Election security ramps up for 2020