Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy is a book about the societal impact of algorithms, written by Cathy O’Neil. It explores how various big data algorithms are increasingly used in ways that reinforce preexisting inequality.
You can consider this blog post a book review, except I won’t really structure it the way how book reviews are usually written. Instead I will summarize the main problems discussed in this book. If you consider this topic interesting, you can get the book for more details.
Here is how Wikipedia entry summarizes the main topic of this book:
We live in the age of the algorithm. Increasingly, the decisions that affect our lives—where we go to school, whether we get a car loan, how much we pay for health insurance—are being made not by humans, but by mathematical models. In theory, this should lead to greater fairness: Everyone is judged according to the same rules, and bias is eliminated.
But as Cathy O’Neil reveals in this urgent and necessary book, the opposite is true. The models being used today are opaque, unregulated, and uncontestable, even when they’re wrong. Most troubling, they reinforce discrimination: If a poor student can’t get a loan because a lending model deems him too risky (by virtue of his zip code), he’s then cut off from the kind of education that could pull him out of poverty, and a vicious spiral ensues. Models are propping up the lucky and punishing the downtrodden, creating a “toxic cocktail for democracy.” Welcome to the dark side of Big Data.
These “weapons of math destruction” score teachers and students, sort résumés, grant (or deny) loans, evaluate workers, target voters, set parole, and monitor our health.
There are multiple problems with mathematical models that score people and sort them according to various criteria: they are opaque, unregulated and difficult to contest. Simultaneously, they are also scalable, thereby amplifying any inherent biases to affect increasingly larger populations. A single racist bank or insurance company employee can unfairly harm hundreds of people of color who get higher interest rates or insurance costs. A single poorly designed algorithm can harm millions of people.
Algorithms are not necessarily more equal than humans.
People are biased. They are bad at evaluating strangers. Usually they don’t even notice their biases. For example, there have been experiments in which scientists created identical fake résumés with either male or female names, and guess what—people who were making the hiring decisions liked the male names better. The same trend has been observed in experiments with white-sounding versus black-sounding names. Thus some people might imagine that an algorithm should be better at scoring us, after all, a computer evaluates everyone equally. Except when it doesn’t.
In her book, Cathy O’Neil gave multiple case studies where standardized testing was worse than actually interviewing people. Let’s imagine a dark-skinned person is looking for a job. They go to a job interview and get refused due to their skin color. Firstly, they can sue the business that refused to hire them. Secondly, they can go look for a job somewhere else until they are lucky to find some other employer who doesn’t have any racial prejudices.
Now let’s imagine that a person gets refused due to scoring poorly in some standardized personality questionnaire. Firstly, they won’t even find out why they got refused. The algorithm that scored their questionnaire is not transparent at all. Even the human resources people at that business probably have no idea why some person scored poorly and was rated as unfit for hiring. Secondly, if this is some standardized test, then many businesses will use it, thus the same person will get declined again and again due to scoring poorly each time.
It is hard to say whether standardized tests are better than letting individual people make hiring/admission decisions. It depends on how some standardized test was created, who made it, and how good or bad it is. Some standardized tests are much worse than others. And many of them serve to further reinforce preexisting inequality. Of course, it also depends on people who make the hiring/admission decisions in some institution (some of them are much more prejudiced than others).
Algorithms are opaque and can be used to unfairly evaluate millions of people.
Algorithms, which automatically sort people into several groups according to whatever criteria, can ruin a person’s life even when programmers who made the software had the best intentions. Plenty of programmers and banks and insurance companies have attempted to create racially unbiased algorithms that are based upon statistics. People have already tried to go beyond “black=untrustworthy.” Unfortunately, it’s not that simple.
Let’s say you tried to make an unbiased mathematical model based solely upon statistics. The algorithm would probably make a correlation that people who earn little money are more likely to fail to pay back their loan on time. It would probably also make a correlation that people who live in some poor neighborhoods are more likely to fail to pay on time. The end result: a wealthy white guy who lives in the rich people’s neighborhood gets to borrow cheaply. Simultaneously, a black person who lives in some neighborhood that got designated as “risky” by the algorithm can only get payday loans with ridiculously high interest rates.
When computers sort people, said mathematical model tends to be opaque. Programmers who created it usually don’t want to disclose how exactly their black box works. After all, if people knew how exactly they are being evaluated, they would try to game the system. Or maybe programmers don’t want to disclose the inner workings of their software, because they know that they are selling snake oil.
Either way, if you go to a bank and speak face to face to some employee who refuses you because of your skin color, you can at least sue the bank. However, when your application is rejected by some mysterious algorithm, you don’t even know the reasons and cannot sue anybody. And it’s not just loans. Algorithms are also used for hiring people, deciding which students ought to be accepted in some university, calculating how much their insurance will cost, etc.
Algorithms can reinforce preexisting inequality.
Have you noticed that in the USA wealthy people can borrow money with little interest, while poor people are denied normal loans and instead have access only to payday loans with sky high interest rates? Have you noticed that car insurance costs less for a wealthy person compared to a poor person? Such discrepancies further reinforce existing inequality.
And you don’t even need a racist programmer for things to go wrong. It’s very easy to unintentionally create a mathematical model that will mistreat already poor customers. Here is a quote from the book about how car insurance rates are deterimed:
Leading insurers including Progressive, State Farm, and Travelers are already offering drivers a discount on their rates if they agree to share their driving data. A small telemetric unit in the car, a simple version of the black boxes in airplanes, logs the speed of the car and how the driver brakes and accelerates. A GPS monitor tracks the car’s movements.
. . . The individual driver comes into focus. Consider eighteen-year-olds. Traditionally they pay sky-high rates because their age group, statistically, indulges in more than its share of recklessness. But now, a high school senior who avoids jackrabbit starts, drives at a consistent pace under the speed limit, and eases to a stop at red lights might get a discounted rate. Insurance companies have long given an edge to young motorists who finish driver’s ed or make the honor roll. Those are proxies for responsible driving. But driving data is the real thing. That’s better, right?
There are a couple of problems. First, if the system attributes risk to geography, poor drivers lose out. They are more likely to drive in what insurers deem risky neighborhoods. Many also have long and irregular commutes, which translates into higher risk.
Fine, you might say. If poor neighborhoods are riskier, especially for auto theft, why should insurance companies ignore that information? And if longer commutes increase the chance of accidents, that’s something the insurers are entitled to consider. The judgment is still based on the driver’s behavior, not on extraneous details like her credit rating or the driving records of people her age. Many would consider that an improvement.
To a degree, it is. But consider a hypothetical driver who lives in a rough section of Newark, New Jersey, and must commute thirteen miles to a barista job at a Starbucks in the wealthy suburb of Montclair. Her schedule is chaotic and includes occasional clopenings. So she shuts the shop at 11, drives back to Newark, and returns before 5 a.m. To save ten minutes and $1.50 each way on the Garden State Parkway, she takes a shortcut, which leads her down a road lined with bars and strip joints.
A data-savvy insurer will note that cars traveling along that route in the wee hours have an increased risk of accidents. There are more than a few drunks on the road. And to be fair, our barista is adding a bit of risk by taking the shortcut and sharing the road with the people spilling out of the bars. One of them might hit her. But as far as the insurance company’s geo-tracker is concerned, not only is she mingling with drunks, she may be one.
In this way, even the models that track our personal behavior gain many of their insights, and assess risk, by comparing us to others. This time, instead of bucketing people who speak Arabic or Urdu, live in the same zip codes, or earn similar salaries, they assemble groups of us who act in similar ways. The prediction is that those who act alike will take on similar levels of risk. If you haven’t noticed, this is birds of a feather all over again, with many of the same injustices.
Cathy O’Neil didn’t say it straight, but the chances are pretty high that the hypothetical barista happens to be black (since black people, on average, earn less and are forced to accept worse jobs). With all this data we still get back where we started—if you are poor and black, your car insurance will cost more than if you are white and rich. And it’s not just that. Your loan will have a higher interest rate. Your job application will get denied by some mysterious algorithm. The unfairness will be perpetuated.
Faulty algorithms can create vicious feedback loops that result in self-fulfilling prophecies.
In 2013, Reading (a small city in Pennsylvania) police chief William Heim invested in crime prediction software made by PredPol, a Big Data start-up. The program processed historical crime data and calculated, hour by hour, where crimes were most likely to occur. Police officers could view the program’s conclusions as a series of squares on a map. The idea was that if they spent more time patrolling these locations, there was a good chance they would discourage crime. Predictive programs like PredPol are common in USA police departments. For example, New York City uses a similar program, called CompStat.
In theory, the model is blind to race and ethnicity, PredPol doesn’t focus on the individual, instead, it targets geography, the key inputs being the type and location of each crime and when it occurred. At first glance, that might seem fair, and it seems useful for cops to spend more time in the high-risk zones.
The problem is that the model takes into account not only homicides and burglaries, but also petty crimes. Serious violent crimes are usually reported to the police regardless of whether some officer was nearby when the crime happened. But the model takes into consideration also less serious crimes, including vagrancy, aggressive panhandling, and selling and consuming small quantities of drugs. Many of these “nuisance” crimes would go unrecorded if a cop weren’t there to see them. Here’s the problem:
Once the nuisance data flows into a predictive model, more police are drawn into those neighborhoods, where they’re more likely to arrest more people… This creates a pernicious feedback loop. The policing itself spawns new data, which justifies more policing. And our prisons fill up with hundreds of thousands of people found guilty of victimless crimes. Most of them come from impoverished neighborhoods, and most are black or Hispanic. So even if a model is color blind, the result of it is anything but. In our largely segregated cities, geography is a highly effective proxy for race.
If the purpose of the models is to prevent serious crimes, you might ask why nuisance crimes are tracked at all. The answer is that the link between antisocial behavior and crime has been an article of faith since 1982, when a criminologist named George Kelling teamed up with a public policy expert, James Q. Wilson, to write a seminal article in the Atlantic Monthly on so-called broken-windows policing. The idea was that low-level crimes and misdemeanors created an atmosphere of disorder in a neighborhood. This scared law-abiding citizens away. The dark and empty streets they left behind were breeding grounds for serious crime. The antidote was for society to resist the spread of disorder. This included fixing broken windows, cleaning up graffiti-covered subway cars, and taking steps to discourage nuisance crimes.
Here we have it—big data is used to make abuse of poor people sound “scientifically justified.” Whenever police goes to some specific location in search of crime, that’s where they will find something:
Just imagine if police enforced their zero-tolerance strategy in finance. They would arrest people for even the slightest infraction, whether it was chiseling investors on 401ks, providing misleading guidance, or committing petty frauds. Perhaps SWAT teams would descend on Greenwich, Connecticut. They’d go undercover in the taverns around Chicago’s Mercantile Exchange.
Once poor people of color are caught committing some petty crime, another mathematical model is used to sentence them to harsher punishments—the recidivism model used for sentencing guidelines.
In the USA, race has long been a factor in sentencing:
A University of Maryland study showed that in Harris County, which includes Houston, prosecutors were three times more likely to seek the death penalty for African Americans, and four times more likely for Hispanics, than for whites convicted of the same charges. That pattern isn’t unique to Texas. According to the American Civil Liberties Union, sentences imposed on black men in the federal system are nearly 20 percent longer than those for whites convicted of similar crimes. And though they make up only 13 percent of the population, blacks fill up 40 percent of America’s prison cells.
So you might think that computerized risk models fed by data would reduce the role of prejudice in sentencing and contribute to more even-handed treatment. With that hope, courts in twenty-four states have turned to so-called recidivism models. These help judges assess the danger posed by each convict. And by many measures they’re an improvement. They keep sentences more consistent and less likely to be swayed by the moods and biases of judges.
The question, however, is whether we’ve eliminated human bias or simply camouflaged it with technology. The new recidivism models are complicated and mathematical. But embedded within these models are a host of assumptions, some of them prejudicial…
One of the more popular models, known as LSI–R, or Level of Service Inventory–Revised, includes a lengthy questionnaire for the prisoner to fill out. One of the questions—“How many prior convictions have you had?”—is highly relevant to the risk of recidivism. Others are also clearly related: “What part did others play in the offense? What part did drugs and alcohol play?”
But as the questions continue, delving deeper into the person’s life, it’s easy to imagine how inmates from a privileged background would answer one way and those from tough inner-city streets another. Ask a criminal who grew up in comfortable suburbs about “the first time you were ever involved with the police,” and he might not have a single incident to report other than the one that brought him to prison. Young black males, by contrast, are likely to have been stopped by police dozens of times, even when they’ve done nothing wrong. A 2013 study by the New York Civil Liberties Union found that while black and Latino males between the ages of fourteen and twenty-four made up only 4.7 percent of the city’s population, they accounted for 40.6 percent of the stop-and-frisk checks by police. More than 90 percent of those stopped were innocent. Some of the others might have been drinking underage or carrying a joint. And unlike most rich kids, they got in trouble for it. So if early “involvement” with the police signals recidivism, poor people and racial minorities look far riskier.
The questions hardly stop there. Prisoners are also asked about whether their friends and relatives have criminal records. Again, ask that question to a convicted criminal raised in a middle-class neighborhood, and the chances are much greater that the answer will be no. The questionnaire does avoid asking about race, which is illegal. But with the wealth of detail each prisoner provides, that single illegal question is almost superfluous.
The LSI–R questionnaire has been given to thousands of inmates since its invention in 1995. Statisticians have used those results to devise a system in which answers highly correlated to recidivism weigh more heavily and count for more points. After answering the questionnaire, convicts are categorized as high, medium, and low risk on the basis of the number of points they accumulate. In some states, such as Rhode Island, these tests are used only to target those with high-risk scores for antirecidivism programs while incarcerated. But in others, including Idaho and Colorado, judges use the scores to guide their sentencing.
This is unjust. The questionnaire includes circumstances of a criminal’s birth and upbringing, including his or her family, neighborhood, and friends. These details should not be relevant to a criminal case or to the sentencing. Indeed, if a prosecutor attempted to tar a defendant by mentioning his brother’s criminal record or the high crime rate in his neighborhood, a decent defense attorney would roar, “Objection, Your Honor!” And a serious judge would sustain it. . . But even if we put aside, ever so briefly, the crucial issue of fairness, we find ourselves descending into a pernicious WMD feedback loop. A person who scores as “high risk” is likely to be unemployed and to come from a neighborhood where many of his friends and family have had run-ins with the law. Thanks in part to the resulting high score on the evaluation, he gets a longer sentence, locking him away for more years in a prison where he’s surrounded by fellow criminals—which raises the likelihood that he’ll return to prison. He is finally released into the same poor neighborhood, this time with a criminal record, which makes it that much harder to find a job. If he commits another crime, the recidivism model can claim another success. But in fact the model itself contributes to a toxic cycle and helps to sustain it. . .
What’s more, for supposedly scientific systems, the recidivism models are logically flawed. The unquestioned assumption is that locking away “high-risk” prisoners for more time makes society safer. It is true, of course, that prisoners don’t commit crimes against society while behind bars. But is it possible that their time in prison has an effect on their behavior once they step out? Is there a chance that years in a brutal environment surrounded by felons might make them more likely, and not less, to commit another crime?
Algorithms can create perverse incentives that result in poor outcomes.
Perverse incentives can result in people doing various generally harmful actions in an attempt to game the algorithm. A mathematical model cannot directly measure whatever it claims to measure. Instead, the model must use some proxies.
For example, let’s consider the U.S. News college rankings, which have been extremely harmful for American colleges and universities. Measuring and scoring the excellence of universities is inherently impossible (how do you even quantify that?), hence the U.S. News just picked a number of various metrics. In order to improve their score, colleges responded by trying to improve each of the metrics that went into their score. For example, if a college increases tuition and uses the extra money to build a fancy gym with whirlpool baths, this is going to increase its ranking. Hence many colleges have done various things that increased their score, but were actually harmful for the students.
Some universities decided to go even further and outright manipulated their score. For example, in a 2014 U.S. News ranking of global universities, the mathematics department at Saudi Arabia’s King Abdulaziz University landed in seventh place, right next to Harvard. The Saudi university contacted several mathematicians whose work was highly cited and offered them thousands of dollars to serve as adjunct faculty. These mathematicians would work three weeks a year in Saudi Arabia. The university would fly them there in business class and put them up at a five-star hotel. The deal also required that the Saudi university could claim the publications of their new adjunct faculty as its own. Since citations were one of the U.S. News algorithm’s primary inputs, King Abdulaziz University soared in the rankings. That’s how you game the system.
Scoring an individual person by analyzing the behavior of somebody else is unethical.
Theoretically, big data can lead to a situation where insurance companies or banks learn more about us to the point where they are able to pinpoint those who appear to be the riskiest customers and then either drive their rates to the stratosphere or, where legal, deny them coverage. I perceive that as unethical. The whole point of insurance is to balance the risk, to smooth out life’s bumps. It’s would be better for everybody to pay the average rather than pay anticipated costs in advance.
Moreover, there’s also the question of how ethical it is to judge some individual person based upon the average behavior of other people who happen to have something in common with the individual you are dealing with. After all, you aren’t dealing with “the average black person,” “the average woman,” “the average person who earns $XXXXX per year,” “the average person who lives in this neighborhood,” “the average person with an erratic schedule of long commutes,” instead you are dealing with a unique individual human being. The inevitable end result of such a system is that some innocent person who isn’t guilty at all, who haven’t done anything bad will get punished because some algorithm put this person in the same bucket with other people who have engaged in some bad or risky behavior.
Promising efficiency and fairness, mathematical models distort higher education, drive up debt, spur mass incarceration, pummel the poor at nearly every juncture, and undermine democracy. As Cathy O’Neil explains:
The problem is that they’re feeding on each other. Poor people are more likely to have bad credit and live in high-crime neighborhoods, surrounded by other poor people. Once the dark universe of WMDs digests that data, it showers them with predatory ads for subprime loans or for-profit schools. It sends more police to arrest them, and when they’re convicted it sentences them to longer terms. This data feeds into other WMDs, which score the same people as high risks or easy targets and proceed to block them from jobs, while jacking up their rates for mortgages, car loans, and every kind of insurance imaginable. This drives their credit rating down further, creating nothing less than a death spiral of modeling. Being poor in a world of WMDs is getting more and more dangerous and expensive.
There is a problem. How do you fix it? Well, you certainly cannot rely on corporations to fix their own flawed mathematical models. Justice and fairness might benefit the society as a whole, but it does nothing to some corporation’s bottom line. In fact, entire business models, such as for-profit universities and payday loans, are built upon further abusing already marginalized groups of people. As long as profit in generated, corporations believe that their flawed mathematical models are working just fine.
The victims feel differently, but the greatest number of them—the hourly workers and unemployed, the people who have low credit scores—are poor. They are also powerless and voiceless. Many are disenfranchised politically.
Thus we can conclude that the problems won’t fix themselves. Cathy O’Neil proposes various solutions. Data scientists should audit the algorithms and search for potential problems. Politicians should pass laws that forbid companies from evaluating their employees or customers based upon dubious data points. The society as a whole needs to demand transparency. For example, each person should have the right to receive an alert when a credit score is being used to judge or vet them, they should have access to the information being used to compute that score, and if it is incorrect, they should have the right to challenge and correct it. Moreover, mathematical models that have a significant impact on people’s lives (like their credit scores, e-scores, etc.) should be open and available to the public.