Spain has become reliant on an algorithm to score how likely a domestic violence victim may be abused again and what protection to provide — sometimes leading to fatal consequences.
The crucial point is: 8% of the decisions turn out to be wrong or misjudged.
The article says:
Yet roughly 8 percent of women who the algorithm found to be at negligible risk and 14 percent at low risk have reported being harmed again, according to Spain’s Interior Ministry, which oversees the system.
Granted, neither “negligible” or “low risk” means “no risk”, but I think 8% and 14% are far too high numbers for those categories.
Furthermore, there’s this crucial bit:
At least 247 women have also been killed by their current or former partner since 2007 after being assessed by VioGén, according to government figures. While that is a tiny fraction of gender violence cases, it points to the algorithm’s flaws. The New York Times found that in a judicial review of 98 of those homicides, 55 of the slain women were scored by VioGén as negligible or low risk for repeat abuse.
So in the 98 murders they reviewed, the algorithm put more than 50% of them at negligible or low risk for repeat abuse. That’s a fucking coin flip!
You’ll get that result without an algorithm as well unfortunately. A domestic violence interview often doesn’t result in you getting the truth of what happens because the victim is often economically and emotionally dependent on their partner. It’s helpful to have an algorithm that makes you ask the right questions but there’s still no way I know of to get the right answers of those questions from a victim 100 percent of the time.
Odd. I replied to this comment, but now my reply is gone. Gonna try again and type up as much as I can remember.
Regardless, an algorithm expecting binary answers will obviously not take para- and extralinguistic cues into account. That extra 50 ms hesitation, the downwards glance and the voice cracking when answering “no” to “has he ever tried to strangle you before?” has a reasonable chance to get picked up by a human, but when reducing it to something that the algorithm can handle, it’s just a simple “no”. Humans are really good at picking up on such cues, even if they aren’t consciously aware that they’re doing it, but if said humans are preoccupied with staring into a computer screen in order to input the answers to the questionnaire, then there’s a much higher chance that they’ll miss them too. I honestly only see negatives here.
It’s helpful to have an algorithm that makes you ask the right questions […]
Arguably a piece of paper could solve that problem.
Seriously. 55 victims out of the 98 homicide cases sampled were deemed at negligible or low risk. If a non-algorithm-assisted department presented those numbered I’d expect them to be looking for new jobs real fast.
I think beyond that it’s purely the failure of the interviewer and not the tool. I think getting rid of the tool will just leave you with shitty interviewers and back to the same situation as you had before.
I’ve given plenty of algorithmic driven assessments myself, though mine are generally much shorter and the weights on the questions much simpler (plus I know the actual reasons behind the weight of my questions and why I’m asking them). You can always intervene when someone’s lying and redirect them and can override the algorithm just like this Spanish policy. Lazy judges and police will exist without the tool.
It might be helpful for the tool to include a label that the interviewer thinks the result is unreliable due to the evasiveness of the interviewee, if only to show where the problems are coming from.
The article says:
Granted, neither “negligible” or “low risk” means “no risk”, but I think 8% and 14% are far too high numbers for those categories.
Furthermore, there’s this crucial bit:
So in the 98 murders they reviewed, the algorithm put more than 50% of them at negligible or low risk for repeat abuse. That’s a fucking coin flip!
You’ll get that result without an algorithm as well unfortunately. A domestic violence interview often doesn’t result in you getting the truth of what happens because the victim is often economically and emotionally dependent on their partner. It’s helpful to have an algorithm that makes you ask the right questions but there’s still no way I know of to get the right answers of those questions from a victim 100 percent of the time.
Odd. I replied to this comment, but now my reply is gone. Gonna try again and type up as much as I can remember.
Regardless, an algorithm expecting binary answers will obviously not take para- and extralinguistic cues into account. That extra 50 ms hesitation, the downwards glance and the voice cracking when answering “no” to “has he ever tried to strangle you before?” has a reasonable chance to get picked up by a human, but when reducing it to something that the algorithm can handle, it’s just a simple “no”. Humans are really good at picking up on such cues, even if they aren’t consciously aware that they’re doing it, but if said humans are preoccupied with staring into a computer screen in order to input the answers to the questionnaire, then there’s a much higher chance that they’ll miss them too. I honestly only see negatives here.
Arguably a piece of paper could solve that problem.
Seriously. 55 victims out of the 98 homicide cases sampled were deemed at negligible or low risk. If a non-algorithm-assisted department presented those numbered I’d expect them to be looking for new jobs real fast.
I think beyond that it’s purely the failure of the interviewer and not the tool. I think getting rid of the tool will just leave you with shitty interviewers and back to the same situation as you had before.
I’ve given plenty of algorithmic driven assessments myself, though mine are generally much shorter and the weights on the questions much simpler (plus I know the actual reasons behind the weight of my questions and why I’m asking them). You can always intervene when someone’s lying and redirect them and can override the algorithm just like this Spanish policy. Lazy judges and police will exist without the tool.
It might be helpful for the tool to include a label that the interviewer thinks the result is unreliable due to the evasiveness of the interviewee, if only to show where the problems are coming from.