I saw a lot of news flashes about twins comparing DNA testing services and finding that they weren’t perfectly identical, and that the services didn’t produce identical results. I didn’t bother to look any deeper, because yes, identical twins do have a small number of genetic differences, and those testing services don’t sequence your genome, they rely on chips to identify some short sequences from a subset of your genome, and there is naturally some sampling error in the process. So this shouldn’t be a surprise.
Fortunately, Larry Moran explains the sources of error.
The main problem by far is due to the way the tests are done which is by hybridizing the customers’ DNA to DNA on a microchip and reading the chip to see if there’s a match. (Ancestry.com uses the latest Illumina microchip that assays 700,000 SNPs.) I think the rate of false positives is quite low but the rate of false negatives is about 2% according to 23andMe. The absence of a match where there should be one can be due to bad luck and differences in the threshold level of binding that constitutes a “hit.” It’s these “no-reads” that makes up most of the false negatives. Because of these limitations of the assay the twins’ DNA results could differ by 2-4% of the SNPs being tested.
So no surprise that they reported some variation. What I found odd is that anyone found this odd at all.
The different testing services also reported different patterns of ancestry. Why would anyone find that to be unexpected?
While he can’t say for certain what accounts for the difference, Gerstein suspects it has to do with the algorithms each company uses to crunch the DNA data.
“The story has to be the calculation. The way these calculations are run are different.”
Heh. I believe I’ve mentioned this very point here: that saying something is an “algorithm” doesn’t mean it’s bias-free. The inputs and the weights on the data and the processing used are all choices by the person who designed the algorithms, and different companies will have different pools of data they are drawing on to make their decisions. Some people don’t get that, though.
Socialist Rep. Alexandria Ocasio-Cortez (D-NY) claims that algorithms, which are driven by math, are racist pic.twitter.com/X2veVvAU1H
— Ryan Saavedra (@RealSaavedra) January 22, 2019
This should be used as a nice example of how datasets and algorithms can color the interpretation of data. Maybe we’ll see fewer asshats buying into digital reductionism, as if everything that comes out of a computer is inarguable truth.