Rebecca Watson had an interesting article/video about the ethics of A/B testing. A/B testing is a type of experiment often performed by tech companies on their users. The companies split users into two groups, and show two different versions of their software/website to each group, and measure the results. The problem is that when scientists perform experiments on human subjects, there’s a formal ethical review process. Should tech companies have an ethical review process too?
Of course, this question is being raised as a result of a specific experiment performed by a specific company. Pearson produces educational software, and performed an A/B test where some students were shown motivational messages. They presented results at a conference, and part of their conclusion was that this was a promising methodology for future research. But is it really, if they didn’t comply with the ethical standards in science? They certainly didn’t get consent from all those human test subjects.
Watson also brought up another case from 2014, when Facebook performed an experiment that changed the amount of positive/negative posts people saw in their news feeds. They published a study, and it was called “Experimental evidence of massive-scale emotional contagion through social networks”. Sounds pretty bad, eh?
Watson seems to conclude that A/B tests should get consent, at least in the case of Pearson. But I think this is going too far. The thing is, A/B testing is absolutely ubiquitous. Watson says, “having worked in marketing and seen A/B tests, it’s just a normal thing that companies do,” but I think this understates it. My fiance and I were trying to figure out how many A/B tests Google has running at any time, and we thought it might be one per employee, implying tens of thousands of experiments. And most of them are for boring things like changing fonts or increasing the number of pixels of white space. If we judge A/B tests on the basis of just two tests that appear in the news, “cherry picking” doesn’t even begin to describe it.
The thing that people don’t like about these experiments, is that they don’t like companies to toy with their emotions. But the thing is, companies are already doing it, just by existing. Every piece of software, every website, involves countless design decisions–they are basically made of decisions–and plenty of those decisions are things that the user is unaware of, and did not give consent to. Plenty of the decisions impact the emotional state of the user, and what can you do? Do companies need to undergo IRB whenever they design any product for human use? And another one for each update?
What if, instead of embedding motivational messages into some students’ software, Pearson had just embedded messages into all of them? Maybe some programmer just thought it was a natural thing to do. Would we even think twice about it? I think I might roll my eyes at the messages, but if I wasn’t thinking of it as an experiment, I wouldn’t even think to apply the ethical standards that we apply to scientific experiments.
Hell, am I running low-level psychological experiments just by going about my daily life? Every time I talk, write, or go out in public, I’m affecting people’s emotions without asking their consent–and I don’t even have a proper control group, so what even is the point?
Okay, forget the daily life analogy, and take another one: suppose I’m a teacher. As a teacher, I might perform experiments on students in order to publish educational research. But even if I didn’t do that, I’d still be running experiments. Every year, I’d make some small changes in the classroom, and at the end of the year I’d conduct evaluations, and compare results to previous years. That’s just how you teach. So why is it that when I want to publish research or talk at a conference, I have to comply with more stringent ethical standards, when all I’d really be doing is basically the same thing I was doing anyway, albeit more systematically?
This isn’t even an analogy. Pearson makes educational software, and they were presenting at an educational research conference. It’s identically the same issue. I don’t work in education so I don’t know the score, but when I looked it up, it seems to be a rather knotty question that educators ask among themselves.
Should there be some ethical oversight for A/B testing? Yes, and in some places there probably already is. But I don’t think A/B experiments are special when compared to other changes in software/website design. When Facebook did the thing where they show more negative news feeds, that was bad, but it would have been bad even if it wasn’t part of an experiment. Maybe there are higher standards when people present the results of A/B testing at scientific conferences. But otherwise, I would treat A/B tests in the same way I would treat any other changes that a company makes to their product.
As a physicist, I never did any research on human subjects. But I do volunteer for a community survey. We’d like to have IRB approval because we get cited by academics, and we give them data. It would give us more legitimacy. And perhaps we will get IRB approval one day. But it’s very difficult for a team with no budget, and only volunteers. In fact, one of our survey’s predecessors made it a goal to get IRB approval, and that project totally failed to get off the ground. So, we hold ourselves to ethical standards, but skip the bureaucratic process.