Researchers at MIT have developed a framework, based on a new privacy metric called PAC Privacy, that could maintain the performance of an AI model while ensuring sensitive data remains safe from attackers.
New Method Efficiently Safeguards Sensitive AI Training Data
The approach maintains an AI model’s accuracy while ensuring attackers can’t extract secret information.
Enhancing Data Privacy with PAC Privacy
Data privacy comes with a cost. There are security techniques that protect sensitive user data, like customer addresses, from attackers who may attempt to extract them from AI models — but they often make those models less accurate. Researchers at MIT have developed a framework, based on a new privacy metric called PAC Privacy, that could maintain the performance of an AI model while ensuring sensitive data remains safe from attackers.
The Benefits of Enhanced Efficiency
The team utilized their new version of PAC Privacy to privatize several classic algorithms for data analysis and machine-learning tasks. They also demonstrated that more ‘stable’ algorithms are easier to privatize with this technique. Stability in an algorithm refers to its ability to produce consistent results even when its training data are slightly modified. This stability helps an algorithm make more accurate predictions on previously unseen data.
PAC, or Payment Card Industry, privacy refers to the protection of sensitive credit card information.
The PCI Security Standards Council sets guidelines for safeguarding data.
This includes encryption, secure servers, and access controls.
Compliance is mandatory for businesses handling credit card transactions.
Non-compliance can result in fines and damage to reputation.
Regular security audits and updates are necessary to maintain compliance.
Estimating Noise for Enhanced Efficiency
To protect sensitive data that were used to train an AI model, engineers often add noise, or generic randomness, to the model so it becomes harder for an adversary to guess the original training data. However, this process reduces a model’s accuracy, and less noise can be added without sacrificing performance.
PAC Privacy automatically estimates the smallest amount of noise one needs to add to an algorithm to achieve a desired level of privacy. The new variant of PAC Privacy works by estimating output variances rather than representing the entire matrix of data correlations across outputs. This approach allows for faster computation and scaling up to larger datasets.

Scaling Up with Anisotropic Noise
The original PAC Privacy algorithm was limited to adding isotropic noise, which is added uniformly in all directions. The new variant, on the other hand, can add anisotropic noise, tailored to specific characteristics of the training data. This enables users to add less overall noise while maintaining the same level of privacy, boosting the accuracy of the privatized algorithm.
Exploring Win-Win Situations
The researchers hypothesize that more stable algorithms are easier to privatize with this technique. They tested this theory on several classical algorithms and demonstrated that their new variant of PAC Privacy can achieve strong privacy guarantees despite the algorithm’s stability.
‘We want to explore how algorithms could be co-designed with PAC Privacy, so the algorithm is more stable, secure, and robust from the beginning,’ says Srini Devadas, a senior author of the paper. The researchers also aim to test their method with more complex algorithms and further explore the privacy-utility tradeoff.
Real-World Applications
The increased efficiency of the new PAC Privacy framework, combined with a four-step template for implementation, would make the technique easier to deploy in real-world situations. This approach can be used to privatize virtually any algorithm without needing access to that algorithm’s inner workings.
‘This is a black box — you don’t need to manually analyze each individual query to privatize the results,’ says Xiangyao Yu, an assistant professor at the University of Wisconsin at Madison. The researchers are actively building a PAC-enabled database by extending existing SQL engines to support practical, automated, and efficient private data analytics.
Conclusion
The development of PAC Privacy represents a significant advancement in ensuring the security and privacy of sensitive AI training data. By enhancing efficiency and stability, this technique offers a promising approach for protecting user data while maintaining model accuracy.