Study: Sentencing software just as bad at predicting recidivism as untrained volunteers
Connecting state and local government leaders
Researchers found that both a specialty algorithm and untrained volunteers were about 65 percent accurate in predicting recidivism.
Software used by judges in sentencing hearings to create recidivism “risk scores” is no better at predicting the likelihood of a repeat offense than untrained volunteers or a simple predictor that only uses two variables, according to a new study from researchers at Dartmouth University.
Julia Dressel, who is now a software engineer in Silicon Valley, looked at the software that has been at the center of a complicated public controversy since ProPublica questioned its false positive rate: the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), which was developed by Equivant.
Dressel and Dartmouth computer science professor Hany Farid wanted to use the data collected by ProPublica to answer a different question: Is the software any better at making these decisions than a human?
The answer they landed on is no.
Both the software and humans were about 65 percent accurate, "which is better than chance," Dressel told GCN. "But if you’re in the position where you’re a criminal defendant and your fate may be relying on this tool that’s only 65 percent accurate, you might want better odds than that.”
The researchers used Amazon's Mechanical Turk crowdsourcing marketplace to find volunteers who were asked to determine if individuals would recidivate within two years of their most recent crime. The volunteers made their decisions after being shown information including the offenders' sex, age, previous criminal history and, in some cases, their race.
After the humans and the software delivered similar, better-than-chance predictions, the researchers tried to create an algorithm that would perform better than both COMPAS and untrained humans. But they weren't able to.
“We typically expect that as we add more data to the classifier or increase the complexity of the classifier that the classification accuracy would improve, but we found that is wasn’t the case,” Dressel said. One linear regression classifier used just two variables -- age and total number of previous convictions -- and it performed equally as well as COMPAS.
“Every time we kept reaching this accuracy level of about 65 percent. And that suggests -- though it doesn’t prove -- that increasing that accuracy is really hard and might not be possible.”
Why do all of these equations top out at a 65 percent accuracy level? Well, the variables being used to predict crime might not be good indicators of criminal activity, Dressel said.
This doesn’t mean the goal of these algorithms -- adding objectivity and accuracy to the criminal justice system -- isn’t admirable. It just means it isn’t achieving the goal of predicting recidivism, she said.
This is a complex issue for governments and it’s one that the National Institute for Standards and Technology, or a similar standards organization, will have to help with, Dressel suggested. An algorithm should have to pass some kind of validation or meet a set of standards before it can be used in a way that has this much impact on an individual’s life, she said. It’s up to the governments and courts that use these products to hold the technology companies accountable, she added.
A positive step would be more transparency, forcing algorithms and their accuracy to be made public. Defendants should know how their risk scores were determined, and judges might use the tools differently if they’re aware of their shortcomings.
The New York City Council introduced a bill last year that would have required not just risk assessment algorithms, but algorithms used in many different decision making processes, to be made public, according to The New York Times.
“If a judge knows that a tool is only around 65 percent accurate then they will incorporate that risk assessment into their decision differently than if they assume this tool has the right, magic answer,” Dressel said.
An official response from COMPAS developer Equivant questioned the math behind the researchers' analysis, but then concluded that the study proved the software’s effectiveness.
“[A] growing number of independent studies that have confirmed that COMPAS achieves good predictability and matches the increasingly accepted … standard of 0.70 for well-designed risk assessment tools used in criminal justice,” the company said.
Equivant also said it would also be asking the researchers for the data they used in to their analysis and would be reviewing it.
It’s important, Dressel said, to better understand these tools before they’re entrusted with such important decisions.
“Predicting the future is really hard, and it's not actually surprising that we aren’t able to predict crimes with that high of an accuracy,” she said.
NEXT STORY: AI breaks into the enterprise