The unrealized potential of interagency total information sharing
Connecting state and local government leaders
Effective security solutions need to strike a balance between data sharing and privacy protection.
Significant events tend to trigger significant reactions. In the immediate aftermath of 9/11, fingers pointed to the lack of information sharing among government agencies at every level as the root of our inability to predict the attack, which unsurprisingly led to calls for total information sharing (TIS). Then, on the heels of a string of massive data breaches, many began questioning if the pendulum swung too far towards sharing information at the expense of data privacy and data security.
The recent and horrific terrorist attacks in Paris will once again prompt meaningful discussions by government agencies on the right balance between TIS and data privacy protection. Already we’ve seen calls in the European press for improved international intelligence sharing to compensate for the lack of border controls within the European Union. This is germane not only for agencies, but for industry as well -- particularly government contractors tasked to provide technology guidance and solutions. To strike an effective balance between TIS and data privacy/security, there are a handful of key considerations to keep in mind.
TIS not living up to potential
Solutions, products and technologies developed to enable better information sharing have had a positive and meaningful impact on national security but have not fulfilled their potential. Weak solutions led to ineffective uses of data when it was shared, refusal to share data in the first place, and misuse and theft of data that was shared. It’s no wonder that agencies remain reluctant to share sensitive information. While refusal to share is easy to understand, ineffective sharing and misuse of data are worth explaining further.
Ineffective sharing. Sharing is ineffective when the data shared is out of date, anonymized beyond usability or filtered to remove significant useful attributes. For example, static copies of data are often exported by one agency for use by another, instead of providing “live” access. While perhaps useful for a short time, such data can quickly grow outdated, defeating the purpose of sharing -- especially in the case of dynamic operational data.
As another example, key attributes are commonly changed to protect sensitive values such as those that uniquely identify individuals, intelligence gathering methods or information sources. Such anonymized data often has limited utility when correlating against other data sets, particularly when the anonymized attributes are the ones used for correlation.
Misuse of data. Once data is shared, even statutory restrictions are ineffective at controlling how the data is used, especially if it’s stolen. Putting the genie back in the bottle just doesn’t work. Stolen identities, kidnappings, job loss and a host of other troubles have come from misuse of shared data. Unfortunately, anonymizing data doesn’t prevent these troubles. It’s well known that age, date of birth, and ZIP code can uniquely identify about 87 percent of individuals in the United States (see aboutmyinfo.org to try it for yourself). It’s also well known that just a few historical samples from the GPS unit in your phone can predict your location with startling accuracy. Far more powerful predictive and identifying ability comes from common practices of correlating even anonymized datasets.
The need for ‘assured’ data privacy
The federal government is signaling the need for assured data privacy, in part because of recent realizations that current TIS shortfalls are either ineffective or dangerous. The Defense Advanced Research Projects Agency‘s Brandeis program is an ambitious effort to unlock the full potential of big data while protecting the privacy of those whom the data describes. Brandeis aims to address privacy-preserving data sharing across government, enterprise and consumer markets -- a signal that assuring data can be kept private is as critical to agencies as assuring data can be shared.
Named for former Supreme Court Justice Louis Brandeis, who as a Harvard Law student penned the seminal essay “The Right to Privacy,” the vision of the Brandeis program is to break the tension between privacy and data sharing by enabling safe and predictable sharing of data in which privacy is preserved.
Brandeis is not the only initiative of its kind. The recently concluded Security and Privacy Research program, a joint effort by the Intelligence Advanced Research Projects Activity and DARPA’s Programming Computation on Encrypted Data program also address privacy-preserving computation and analytics. These and other programs seem to be a clear sign that current TIS strategies are failing to meet the needs of many agencies today and appear inadequate to the data sharing opportunities of tomorrow.
Where can secure, effective data sharing take us in the future? Imagine personalized medicine that combines patient genetic and phenotypic profiles with pharma data on candidate therapies to design personal cancer treatments without revealing patient data. Imagine computer network situational awareness enabled by companies sharing cyberattack and network loading data without revealing their own vulnerabilities and capabilities to adversaries or competitors. Without assured privacy, such possibilities would (and should) face systemic opposition. With assured privacy, however, would you allow your data to be used to gain these benefits? I would.
There is a definitive need for technologies that provably prevent shared data from being used for anything other than its intended purpose. It takes a combination of technologies to achieve this goal. One such technology family that is on the verge of readiness is secure computation. For example, two-party secure computation allows two users to jointly agree on a computation, and jointly carry it out in a way that neither learns anything about the other’s inputs (except what might be inferred from seeing the result).
As another example, secure multi-party computation allows multiple parties to contribute data to a computation performed in the cloud, and enables those inputs to remain completely secure even though the participating cloud servers may not be trusted. This sort of work enabled the Estonian government to detect tax fraud and to study the relationship between success in college and holding a job while being a student, all without revealing data across agency boundaries.
The path forward for TIS systems
Security in general, and secure data sharing in particular, can’t be “sprinkled on top” of TIS solutions like fairy dust. Instead, security and supporting technologies need to be designed in from the beginning. Building private TIS systems that strike the right balance requires that these systems include a number of security capabilities:
Keeping data unintelligible by default while in transit and at rest. Encryption is the standard for keeping data private. In transit, solutions such as virtual private networks, secure shell and transport layer security can provide this protection, so long as the encryption is strong enough. 2048 bit public key infrastructure key sizes are the minimum any contractor should consider for sensitive data. At rest, solutions such as the Advanced Encryption Standard have a strong track record. AES-256 with the appropriate choice of encryption modes should be the standard here.
Where possible, keeping data encrypted even while in use during computation. An increasing number of analytics and applications can be performed while data remains encrypted, though at some cost in performance. Secure computation should be a ground-level consideration for all TIS products.
Auditing all access and allowing only authorized computation on data. Password protection is simply not enough. All cloud-stored data and all shared data should be access-protected by at least two-factor authentication, where one of those factors changes frequently or is derived from a current bio-signature of the user.
Logging access securely. Defense in depth remains important until we can provably claim perfect security. Knowing who accessed the data is an important backup protection. Cryptographic block chains, such as those used for Bitcoin, can also be used to unalterably, securely track who accessed sensitive files.
Helping data owners understand and decide access policies for their data. If you’re like most people, when pressed about which uses of private data you will allow, you likely throw up your hands and say something like, “Just don’t let my data be used against me!” That same frustration makes people wince when you ask, “When did you last read Google’s privacy policy for your data?” TIS policy faces these same problems. This is a current area of research, but one where TIS vendors can offer substantial value by applying machine learning and human-machine interaction techniques to policy refinement.
It may seem like a complete turnaround from TIS to privacy assurance, but without the latter, the former often isn’t happening. We know it’s possible to simultaneously protect and share data, so we know effective TIS is possible. Government and industry organizations that reimagine TIS as the product of secure data-sharing technology have the rare opportunity to make effective TIS a reality.
NEXT STORY: White House responds to data encryption petition