Algorithms are only magical oracles if you don’t understand them


I saw a lot of news flashes about twins comparing DNA testing services and finding that they weren’t perfectly identical, and that the services didn’t produce identical results. I didn’t bother to look any deeper, because yes, identical twins do have a small number of genetic differences, and those testing services don’t sequence your genome, they rely on chips to identify some short sequences from a subset of your genome, and there is naturally some sampling error in the process. So this shouldn’t be a surprise.

Fortunately, Larry Moran explains the sources of error.

The main problem by far is due to the way the tests are done which is by hybridizing the customers’ DNA to DNA on a microchip and reading the chip to see if there’s a match. (Ancestry.com uses the latest Illumina microchip that assays 700,000 SNPs.) I think the rate of false positives is quite low but the rate of false negatives is about 2% according to 23andMe. The absence of a match where there should be one can be due to bad luck and differences in the threshold level of binding that constitutes a “hit.” It’s these “no-reads” that makes up most of the false negatives. Because of these limitations of the assay the twins’ DNA results could differ by 2-4% of the SNPs being tested.

So no surprise that they reported some variation. What I found odd is that anyone found this odd at all.

The different testing services also reported different patterns of ancestry. Why would anyone find that to be unexpected?

While he can’t say for certain what accounts for the difference, Gerstein suspects it has to do with the algorithms each company uses to crunch the DNA data.

“The story has to be the calculation. The way these calculations are run are different.”

Heh. I believe I’ve mentioned this very point here: that saying something is an “algorithm” doesn’t mean it’s bias-free. The inputs and the weights on the data and the processing used are all choices by the person who designed the algorithms, and different companies will have different pools of data they are drawing on to make their decisions. Some people don’t get that, though.

This should be used as a nice example of how datasets and algorithms can color the interpretation of data. Maybe we’ll see fewer asshats buying into digital reductionism, as if everything that comes out of a computer is inarguable truth.

Comments

  1. ardipithecus says

    He’s doing us a favour by demonstrating his complete and utter failure to understand how data crunching algorithms work.

  2. whheydt says

    My wife wandered past and commented that the title for this post was an application of Clarke’s Third Law. (And for those unfamiliar with it, it’s “Any sufficiently advanced technology is indistinguishable from magic.”)

  3. mond says

    Funny how the algorithm I use to make a cup of tea seems to produce a cuppa just the way I like.
    It’s almost as if I put my own preferences and biases into the algorithm.

  4. Artor says

    “Socialist Rep. Alexandria Ocasio-Cortez (D-NY) claims that algorithms, which are driven by math, are racist.”

    Algorithms- driven by math, but written by people.

  5. robert79 says

    One of my main frustrations about reporting is that many media (both online and offline) seem to think the word “algorithm” is a synonym for “something that cannot be understood”, “stuff with maths” or “magic”. The long division you learned in elementary school is an algorithm, as is the algorithm for a cup of tea that mond described above.

    I forbid my maths students to use the word algorithm unless they can actually describe the algorithm.

  6. chris61 says

    This should be used as a nice example of how datasets and algorithms can color the interpretation of data.

    Clever.

  7. chrislawson says

    Artor@6–

    Algorithms are not necessarily driven by math. Algorithms are simply sets of discrete, ordered* instructions. Algorithms are crucially important in mathematics and vice versa, but do not map 1:1. My definition of an algorithm is something along these lines, adapted from this hackernoon article:

    Algorithm: a sequence of steps that describes an idea for solving a problem, an abstract recipe for a solution independent of implementation.
    vs.
    Code: a set of instructions for a computer. A concrete implementation of the algorithm on a specific platform in a specific programming language.

    An example of non-mathematical algorithms is Western sheet music, which tells you the sequence of notes to be played and includes control commands (e.g. time signature, key, dynamic instructions like “diminuendo”) and recursive elements like repeat brackets.

    *the order does not have to be linear; almost all algorithms in practical use are non-linear and multiply recursive.

  8. bryanfeir says

    My favourite story about this was something I heard back in University (so over 25 years ago now). I may have mentioned it before. And no, I don’t remember enough details to be able to pin this down, so feel free to take this with however many grains of salt required.

    So, a school’s grad program had been under fire for racism and sexism in their selection process. One common method of outside checking for this sort of thing involves sending c.v.’s that are pretty much identical except for the names, and seeing if the ‘obviously’ female or non-WASP names get lower rates of follow-ups. This school had failed a test like that pretty badly, and decided to try and automate the first pass of the selection and remove the human element from it, as part of renovating their image.

    So, what they did was they took a bunch of their old c.v.’s and selection criteria, fed them into a neural network along with whether or not the student was accepted, and trained the neural network in which sets of criteria were acceptable for further review by a human. (Some of you have already figured out where this is going.)

    When the same sort of test was performed on the new system, it failed again. The administrators claimed that wasn’t possible, because the machine couldn’t be racist or sexist. But, of course, when they trained the neural network, they had included the whole c.v…. which included the name of the applicant. So the neural network had learned that given two otherwise identical c.v.’s, the one with the female or non-WASP-sounding name was to be given lower priority.

    Needless to say, the network had to be completely retrained with anonymized data.

  9. methuseus says

    @mikehuben #12:
    I just started reading this a week ago thanks to PZ mentioning someone hating having to read her book for a data science class.

    As I was thinking, this just again goes to this whole thing: people are actually surprised that we can write programs that reflect our internal biases??

  10. John Morales says

    methuseus, yes, because algorithms are but information machines, and machines don’t have consciousness, so they can’t be biased.

    (The information upon which they are constituted, though…)

  11. says

    I second mikehuben’s recommendation for “Weapons of Math Destruction.” A math seminar class I took last year at Central Washington University used it as a launching point to discuss how technology that aims to end bias can actually make it more deeply entrenched by hiding it behind a black box. Who gets paroled, who gets employed, who gets an apartment, who gets in to school… it’s really insidious.