Researchers at MIT have developed a new technique to reduce bias in AI models while preserving or improving accuracy. Their approach identifies and removes problematic datapoints that contribute to model failures on minority subgroups.
Researchers at MIT have developed a new technique to reduce bias in AI models while preserving or improving accuracy. This technique identifies and removes the training examples that contribute most to a model’s failures on minority subgroups.
Machine-learning models can fail when trying to make predictions for individuals who were underrepresented in the datasets they were trained on. For instance, a model predicting the best treatment option for someone with a chronic disease may be trained using a dataset that contains mostly male patients, leading to incorrect predictions for female patients.
The MIT researchers combined two ideas into an approach that identifies and removes problematic datapoints. They seek to solve the problem of worst-group error, which occurs when a model underperforms on minority subgroups in a training dataset.
Their new technique is driven by prior work introducing a method called TRAK, which identifies the most important training examples for a specific model output. For this new technique, they take incorrect predictions made about minority subgroups and use TRAK to identify which training examples contributed the most to that incorrect prediction.
The researchers’ new technique is an accessible and effective approach to improving fairness in machine learning models. By identifying and removing specific points in a training dataset, it maintains the overall accuracy of the model while boosting its performance on minority subgroups. This technique can be applied to many types of models and has the potential to improve outcomes in various fields, including healthcare.