Magazine Summer 2022 Issue Frontiers

Preserving Privacy While Sharing Data

Differential privacy can safeguard personal information when data is being shared, but it requires a high level of expertise.

Simson L. Garfinkel and Claire McKay Bowen April 26, 2022 Reading Time: 9 min

Topics

Frontiers

An MIT SMR initiative exploring how technology is reshaping the practice of management.

Get Updates on Leading With AI and Data

Get monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.

Please enter a valid email address

Thank you for signing up

The problem, computer scientists have discovered, is that the more information an organization releases, the more likely it is that personally identifiable information can be uncovered, no matter how well those details are protected. It turns out that protecting privacy and publishing accurate and useful data are inherently in opposition.

In an effort to tackle this dilemma, computer scientists have developed a mathematical approach called differential privacy (DP), which works by making that trade-off explicit: To ensure that privacy is protected, some accuracy in the data has to be sacrificed. What’s more, DP gives organizations a way to measure and control the trade-off. Many researchers now regard DP as the gold standard for privacy protection, allowing users to release statistics or create new data sets while controlling the degree to which privacy may be compromised.

How Differential Privacy Works

Invented in 2006, DP works by adding small errors, called statistical noise, to either the underlying data or when computing statistical results. In general, more noise produces more privacy protection — and results that are less accurate. While statistical noise has been used for decades to protect privacy, what makes DP a breakthrough technology is the way it gives a numerical value to the loss of privacy that occurs each time the information is released. Organizations can control how much statistical noise to add to the data and, as a result, how much accuracy they’re willing to trade to ensure greater privacy.1

The U.S. Census Bureau developed the first data product to use DP in 2008. Called OnTheMap, it provides detailed salary and commuting statistics for different geographical areas.

Topics

Frontiers

An MIT SMR initiative exploring how technology is reshaping the practice of management.

About the Authors

Simson L. Garfinkel is the senior data scientist in the Office of the Chief Information Officer at the U.S. Department of Homeland Security, a part-time faculty member in the data science program at George Washington University, and a member of the Association for Computing Machinery’s U.S. Technology Public Policy Committee. This article was written in his personal capacity and does not reflect the official policy of DHS. Claire McKay Bowen focuses on data privacy and confidentiality as principal research associate at the Urban Institute. Both authors formerly worked on privacy initiatives at the U.S. Census Bureau.

References

1. While we will not explore the mathematics of DP here, readers who wish to know more are directed to C.M. Bowen and S. Garfinkel, “The Philosophy of Differential Privacy,” Notices of the American Mathematical Society 68, no. 10 (November 2021): 1727-1739; and A. Wood, M. Altman, A. Bembenek, et al., “Differential Privacy: A Primer for a Non-Technical Audience,” Vanderbilt Journal of Entertainment and Technology Law 21, no. 1 (fall 2018): 209-276.

2. For a discussion of the controversy involving the deployment of DP and the 2020 U.S. Census, see S. Garfinkel, “Differential Privacy and the 2020 U.S. Census,” MIT Case Studies in Social and Ethical Responsibilities of Computing (winter 2022), mit-serc.pubpub.org.

Tags:

Reprint #:

63405

Topics

Frontiers

Get Updates on Leading With AI and Data

How Differential Privacy Works

Topics

Frontiers

About the Authors

References

Tags:

Reprint #:

More Like This

Add a comment Cancel reply