Do’s and don’ts of data de-identification
Connecting state and local government leaders
Data use standards and purpose-built software can help agencies protect individuals’ privacy while allowing for meaningful statistical analysis, NIST says.
Government goes to great lengths to protect personally identifiable information in the data it collects. To ensure de-identified data cannot be engineered to reveal individuals’ sensitive information, the National Institute of Standards and Technology is updating its guidance to address advances in privacy technology in the six years since the last version was issued.
De-identifcation allows researchers to prevent or limit privacy risks to those individuals whose personal data is contained in the dataset while still allowing for meaningful statistical analysis.
Agencies may rely on models such as k-anonymity or differential privacy as safeguards against prying eyes, but advances in geolocation technology, for example, can allow outsiders—including identity thieves, journalists looking for confidential details or researchers hoping to distinguish their findings from others—to break down those barriers, according to NIST.
Agencies should perform de-identification using software specifically designed for that purpose with the guidance of trained individuals, NIST recommended. “While it is possible to perform de-identifcation with off-the-shelf software like a commercial spreadsheet or financial planning program, such programs typically lack the key functions required for proper de-identifcation,” the guidance stated. Failure to use the appropriate software “may result in a dataset that appears de-identifed but that still contain significant disclosure risks.”
When it comes to choosing such software, NIST recommended considering the tradeoff between data usability and privacy protection and minimizing the chances for tool and user error. Agencies should assess the tool’s user friendliness, as well as its ability to work with chosen algorithms and prevent unexpected data leaks. Potential users should also evaluate the tool’s efficiency when running on different datasets and be sure it can replicate results when run on the same dataset twice or by two different users.
Disclosure review boards consisting of legal and technical privacy experts and organization stakeholders and leaders can also help agencies with privacy protection, NIST said.
Adopting de-identification standards is another crucial strategy for protecting sensitive data. For example, setting accuracy goals allows stakeholders to align de-identification techniques with their agencies’ intended use of the data. Standards like this “can make the data sufficiently accurate for the intended purpose but not unnecessarily more accurate, which can limit the amount of privacy loss,” the authors wrote. NIST also warned that poor standards maintenance may provide an inaccurate foundation for scientific research and policy decisions.
“Decisions and practices regarding the de-identifcation and release of government data can be integral to the mission and proper functioning of a government agency,” NIST said. “As such, an agency’s leadership should manage these activities in a way that assures performance and results in a manner that is consistent with the agency’s mission and legal authority.”
NIST is looking for comments on the third draft of SP 800-188, released Nov. 15. Comments are due Jan. 15, 2023.
NEXT STORY: The Most Future-ready US Cities