Andreas Avester summarized Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil. Now, I’m not sure how many readers remember this, but I’m a professional data scientist. Which doesn’t really qualify me as an authority to talk about data science, much less the ethics thereof, but, hey, it’s a thing. I have thoughts.
In my view there are two distinct1 ethical issues with data science: 1) our models might make mistakes, or 2) our models might be too accurate. As I said in Andreas’ comments:
The first problem is obvious, so let me explain the second one. Suppose you found an algorithm that perfectly predicted people’s healthcare expenses, and started using this to price health insurance. Well then, it’s like you might as well not have health insurance, because everyone’s paying the same amount either way. This is “fair” in the sense that everyone’s paying exactly the amount of burden they’re placing on society. But it’s “unfair” in that, the amount of healthcare expenses people have is mostly beyond their control. I think it would be better if our algorithms were actually less accurate, and we just charged everyone the same price–modulo, I don’t know, smoking.
The ethical perils of high accuracy predictions get right to the heart of the question: what is fair? Is it fair for people to pay a price according to how much healthcare they need? Or can we admit that the distribution of health problems is already unfair to begin with? And what do we do about health problems that are at least partially within people’s control? Do we charge people the full costs, or just enough to generate an effective incentive structure?
Andreas Avester quoted a section of the book talking about car insurance, which is fairly similar to my example of health insurance. If data science enables car insurance companies to more accurately predict who will get into car accidents, then this will increase inequality. O’Neil seems to agree on this point.
However, O’Neil chose to illustrate the problem with a hypothetical case of a good driver who has to commute through a bad neighborhood late at night. An insurance company that tracks her location might conclude that she is a risky driver, and charge her more for insurance. While I agree that this is unfair, it’s a poor example, because it suggests that the solution is to make our models more and more accurate, which might exacerbate the problem! As I said in the comments,
I think O’Neil chooses that example, because it’s just easier to sympathize with the good driver who is screwed by the algorithm. It’s harder to sympathize with all the bad drivers getting screwed. But they are nonetheless getting screwed, and increasingly accurate algorithms will hurt rather than help.
Here’s why I think “too much accuracy” is a bigger problem than “too many mistakes”. Improving accuracy is already in the interest of the companies that make these models. Insurance companies don’t want to charge the good driver higher prices, because a competing company who figures out that she’s a good driver will be able to undercut their price. Companies adopt data science methods precisely because they have proven to themselves that these methods make fewer mistakes. However, when the problem is “too much accuracy”, this is a problem that can only be addressed through policy, and I don’t trust companies to support those policies.2
But I should also offer a contrary argument. Too much accuracy might, in the long run, be the deeper problem, but in the short run, data science is a developing field that will certainly make mistakes. And even when it makes fewer mistakes than traditional methods, it’s possible that it makes worse mistakes. For example, the task of picking out job candidates from a stack of resumes is a deeply unfair and not very accurate process. But if gender identifiers are mostly hidden, we can hope that it’s at least equally unfair to people of all genders. On the other hand, if you give the stack of resumes to a neural network, the algorithm might be able to infer gender from various clues, and you might not even know it has done so. In principle, companies want more accurate models, but in practice, they might find it expedient to accept more discriminatory models.
By the way, if any readers would be interested in me writing more about data science topics, let me know.
1. I don’t mean to say these are the only two ethical issues in data science. Andreas Avester’s summary also discussed perverse incentives, which strikes me as a third distinct problem. And although it was absent from this discussion, privacy concerns are obviously a big deal. (return)
2. Although, in the case of insurance, I’d expect insurance companies to be in favor of regulation. Because without regulation, the insurance markets might just collapse. (return)