Knowing when to trust a model’s predictions is not always an easy challenge for professionals who use machine-learning models to aid in decision-making, especially since these models are frequently so complicated that their inner workings remain a mystery.
Selective regression is a method in which the model calculates its confidence level for each prediction and rejects predictions if its confidence is too low. After then, a person can look over those situations, gather further data, and manually decide on each one.
While researchers are working on new models, regulators are trying to set a standard in the usage of artificial intelligence. Two months ago we discussed the EU AI Act and now the UK prepares the AI rulebook.
With selective regression the likelihood of AI making the correct prediction rises
Although selective regression has been shown to enhance a model’s overall performance, MIT and the MIT-IBM Watson AI Lab researchers have found that the method can have the reverse effect for underrepresented racial and ethnic groups in a dataset. With selective regression, the likelihood that the model will make the correct prediction rises along with the model’s confidence, although this is not necessarily the case for all subgroups.
For example, a model that predicts loan approvals may, on average, make fewer mistakes, but it may generate more inaccurate predictions for Black or female applicants. This can happen for a number of reasons, including the model’s confidence measure being inaccurate for these underrepresented groups because it was developed using overrepresented groups.
The MIT researchers created two algorithms to address the issue after they discovered it. They demonstrate that the algorithms lessen performance discrepancies that have impacted underrepresented minorities using real-world datasets.
“Ultimately, this is about being more intelligent about which samples you hand off to a human to deal with. Rather than just minimizing some broad error rate for the model, we want to make sure the error rate across groups is taken into account in a smart way,” explained senior MIT author Greg Wornell, the Sumitomo Professor in Engineering in the Department of Electrical Engineering and Computer Science (EECS) who leads the Signals, Information, and Algorithms Laboratory in the Research Laboratory of Electronics (RLE) and is a member of the MIT-IBM Watson AI Lab.
EECS graduate student Abhin Shah and postdoc Yuheng Bu, as well as Joshua Ka-Wing Lee SM ’17, ScD ’21 and Subhro Das, Rameswar Panda, and Prasanna Sattigeri, research staff members of the MIT-IBM Watson AI Lab, are co-lead authors on the study with Wornell. At this month’s International Conference on Machine Learning (ICML), the study will be presented.
A method for estimating the connection between dependent and independent variables is called selective regression. Selective regression analysis is frequently used in machine learning for prediction problems, such as estimating the price of a home given its attributes (number of bedrooms, square footage, etc.) With selective regression, the machine-learning model can make one of two choices for each input — it can make a prediction or abstain from a prediction if it doesn’t have enough confidence in its judgment.
When the model abstains, it reduces the fraction of data on which it makes predictions, which is known as coverage. The model’s overall performance should increase if it only makes predictions on inputs about which it is extremely sure. However, this can accentuate biases in a dataset that occur when the model lacks sufficient data from specific subgroups. This can result in inaccuracies or incorrect projections for underrepresented people.
“It was challenging to come up with the right notion of fairness for this particular problem. But by enforcing this criteria, monotonic selective risk, we can make sure the model performance is actually getting better across all subgroups when you reduce the coverage,” explained Shah.
To overcome the challenge, the team created two neural network algorithms that impose this fairness criterion.
One technique ensures that the features used by the model to produce predictions incorporate all information about the sensitive qualities in the dataset that is important to the target variable of interest, such as race and gender. Sensitive qualities are characteristics that cannot be used to make judgments, usually due to laws or corporate policies. The second strategy uses a calibration technique to verify that the model predicts the same thing for any input, regardless of whether any sensitive attributes are added to that input.
The algorithms were evaluated by the researchers on real-world datasets that could be used in high-stakes decision making. One dataset, an insurance dataset, is used to estimate total annual medical expenses paid to patients based on demographic data; another, a crime dataset, is used to predict the amount of violent crimes in regions based on socioeconomic data. Individual sensitive features are present in both datasets.
They were able to minimize disparities by achieving reduced error rates for minority subgroups in each dataset when they put their methods on top of a typical machine-learning method for selective regression. Furthermore, this was accomplished with minimal impact on the overall error rate.
“We see that if we don’t impose certain constraints, in cases where the model is really confident, it could actually be making more errors, which could be very costly in some applications, like health care. So if we reverse the trend and make it more intuitive, we will catch a lot of these errors. A major goal of this work is to avoid errors going silently undetected,” said Sattigeri.
According to Shah, the researchers intend to apply their answers to other jobs such as predicting property prices, student GPA, or loan interest rate to see if the algorithms need to be tuned for those activities. To prevent privacy concerns, they also aim to investigate ways that use less sensitive information during the model training process.
They also seek to enhance confidence estimates in selective regression to avoid scenarios in which the model’s confidence is low but its prediction is true. According to Sattigeri, this might lessen the workload on humans and further streamline the decision-making process. Did you know fake data improved the performance of robots by 40%?