Why the algorithm is so often wrong

As a data scientist, the number one question I hear from friends is “How did the algorithm get that so wrong?” People don’t know it, but that’s a data science question.

For example, Facebook apparently thinks I’m trans, so they keep on advertising HRT to me. How did they get that one wrong? Surely Facebook knows I haven’t changed pronouns in my entire time on the platform.

I really don’t know why the algorithm got it wrong in any particular case, but it’s really not remotely surprising. For my job, I build algorithms like that (not for social media specifically, but it’s the general idea), and as part of the process I directly measure how often the algorithm is wrong. Some of the algorithms I have created are wrong 99.8% of the time, and I sure put a lot of work into making that number a tiny bit lower. It’s a fantastically rare case where we can build an algorithm that’s just right all the time.

If you think about it from Facebook’s perspective, their goal probably isn’t to show ads that understand you on some personal level, but to show ads that you’ll actually click on. How many ads does the typical person see, vs the number they click on? Suppose I never click on any ads. Then the HRT ads might be a miss, but then so is every other ad that Facebook shows me, so the algorithm hasn’t actually lost much by giving it a shot.

So data science algorithms are quite frequently wrong simply as a matter of course. But why? Why can’t the algorithm see something that would be so obvious to any human reviewer?

Link Roundup: March 2022

CGP Grey was WRONG | CGP Grey (video, 18 min) – An old video, but I really liked this discussion of errors made in the context of content creation.  It can be a lot of work to get things right, but then as soon as you hit publish some expert immediately appears to point out the glaring problem. But this dynamic scales weirdly with popularity.  When you’re obscure, it hardly matters what you say, and there aren’t always experts around to correct you; but when you’re popular you have to spend a lot of time getting it right the first time.

The Worst Double Standard in Gaming | Graythorn (video, 21 min) – This video points out that MMORPGs and life simulation games are quite similar, but the former tend to have more gamer cred.  Graythorn then analyzes the differences in the genres to infer what game elements are associated with greater “legitimacy”.

The Bisexual Gimmick | A Deep Dive into Bisexual Reality Television | verilybitchie (video, 1:30 hours) – Verity Ritchie goes through a list of reality television shows that have used bisexuality as a gimmick, from the conscientious to the sensational.  Guess which shows were most popular.  A fascinating study of the many issues in bisexual media representation.  I particularly liked the discussion of monogamy as it’s understood in reality television.  Like, they’re clearly not monogamous, but they have this fiction that it’s all monogamous because monogamy is the end goal.

Origami: Rhombus Weave

Rhombus Weave

Rhombus Weave, designed by Eric Gjerde

At this point in time, I have almost 10 years of origami photos to choose from, and though the pace of my artwork has slowed during the pandemic, I still have a large number of photos in my backlog.  (You can, of course, find them all if you find the link on my sidebar.)  This one comes from an earlier era when I didn’t care what was in the background, because the photos were only for myself, and the backgrounds added flavor.  A few of these are embarrassing, but I actually like this one because it’s the balcony view from my old apartment.

This origami tessellation comes from Eric Gjerde’s classic book, Origami Tessellations, definitely recommended if you ever want an introduction.  The pattern on the paper has horizontal lines, but the rhombuses are twisted so that the lines undulate up and down.