Another good article on sociogenomics

Do you want to know how sociogenomics works? Here you go.

If this is “the science,” the science is weird. We’re used to thinking of science as incrementally seeking causal explanations for natural phenomena by testing a series of hypotheses. Just as important, good science tries as hard as it can to disprove the working hypotheses.

Sociogenomics has no experiments, no null hypotheses to accept or reject, no deductions from the data to general principles. Nor is it a historical science, like geology or evolutionary biology, that draws on a long-running record for evidence.

Sociogenomics is inductive rather than deductive. Data is collected first, without a prior hypothesis, from longitudinal studies like the Framingham Heart Study, twin studies, and other sources of information—such as direct-to-consumer DNA companies like 23andMe that collect biographical and biometric as well as genetic data on all their clients.

Algorithms then chew up the data and spit out correlations between the trait of interest and tiny variations in the DNA, called SNPs (for single-­nucleotide polymorphisms). Finally, sociogenomicists do the thing most scientists do at the outset: they draw inferences and make predictions, primarily about an individual’s future behavior.

Sociogenomics is not concerned with causation in the sense that most of us think of it, but with correlation. The DNA data often comes in the form of genome-wide association studies (GWASs), a means of comparing genomes and linking variations of SNPs. Sociogenomics algorithms ask: are there patterns of SNPs that correlate with a trait, be it high intelligence or homosexuality or a love of gambling?

Yes—almost always. The number of possible combinations of SNPs is so large that finding associations with any given trait is practically inevitable.

I’m not just being mean when I say it’s garbage science. “Chewing up data and spitting out correlations,” especially when correlations are ubiquitous, is not a productive approach to much of anything.

Where will it take us? That’s easy to see.

Advocates of sociogenomics envision a prospect that not everyone will find entirely benevolent: health “report cards,” based on your genome and handed out at birth, that predict your risk of various diseases and propensity for different behaviors. In the new social sciences, sociologists will examine the genetic component of educational attainment and wealth, while economists will envision genetic “risk scores” for spending, saving, and investment behavior.

Without strong regulation, these scores could be used in school and job applications and in calculating health insurance premiums. Your genome is the ultimate preexisting condition.

There’s precedent. The article mentions how Simon Binet invented the IQ test as a tool to identify and help students who were lagging in school…and then within decades discovered “that people were being sterilized for scoring too low”. I know that if I’d been assigned a genetic “risk score” with my family history, I and my brothers and sisters would have been doing manual labor for our short lives.

Also, I still want to know how this pseudonymous eugenics research program with it’s 15 new hires of “young, often charismatic scientists” is getting funded. Following the money would be a good idea here.


  1. Pierce R. Butler says

    “This individual has genes for a high melanin level, and thus a well-above-average probability of high-velocity lead poisoning.”

  2. says

    The scientific method is simple: you collect discrete empirical measurements of some phenomenon. Those measurements, like all measurements are stochastic and reflect a presumed ‘true value’ that is unseen. A collection of discrete measurements is then assumed to reflect some continuous phenomenon that we hypothesize through a mathematical formula. That mathematical formula ipso facto predicts other discrete results which we then attempt experiments to see if that result exists. If it does for several predictions then we assume the assumption hypothesis is ‘proven’ because we can use it to represent the data of the phenomenon. It doesn’t matter that our measurements are stochastic, only that they can be shown to be within 3 standard deviations of a predicted value.

  3. says

    …health “report cards,” based on your genome and handed out at birth, that predict your risk of various diseases and propensity for different behaviors.

    Something like that is already being done: your doctor has at least some information — whatever they considered relevant, at least — about your genome and other observed physical factors. Which is perfectly okay, as long as it’s only seen by your actual doctors who are dealing with your medical issues. Giving this information to sociologists and economists, and imposing on them the assumption that it’s relevant and useful to them, is — besides a gross violation of privacy — a sure-fire recipe for lots of bad decisions that affect other people’s lives and opportunities.

  4. lanir says

    In the new social sciences, sociologists will examine the genetic component of … wealth

    Sounds self-refuting to me. It’s bad enough to confuse correlation with causation. But to follow it up with a claim that “rich people have rich babies” and weirdly correlate that with genetics rather than say… getting handed piles of money? Yeah, that’s just ridiculous. And how would you exclude that as an outlier while keeping your other assertions without making some arbitrary rule?

  5. Matt G says

    Reminds me of a story I learned from the Skeptic’s Guide to the Universe podcast. A study looked for higher disease incidence in people who lived near high-voltage power lines (in Denmark?). They found a correlation for one disease out of a list of 200. The problem will be obvious to readers of Pharyngula. See also the classic Green Jellybean xkcd comic.

  6. says

    I have chewed up the data, and I have found 10 incredible parallels between Abraham Lincoln and John F. Kennedy.
    I am a scientist.

  7. dangerousbeans says

    Sounds like the old protestant prosperity gospel grift but with more SCIENCE!

  8. chrislawson says


    That is a very good description of the reductive part of science, which is necessary but not sufficient to explain the scientific method. I would also argue against using mathematics as the defining feature of a good hypothesis as this would exclude from ‘science’ the work that identified the DNA helix, not to mention all of early evolutionary theory. In some famous papers such as John Snow’s cholera study, the only maths involved was counting up numbers of cases and putting them on a map. Also, plenty of very important science is not hypothesis testing — much of the data published by the CDC, WHO, etc. is purely cross-sectional.

    Having said that, you’re completely correct that trawling databases for tiny, almost certainly spurious correlations and then calling them causes without further investigation doesn’t even meet a basement-level standard for science.

  9. chrislawson says


    I have been too lazy to try it, but I have no doubt at all that I could find similar correlations with the letters in a person’s surname, or street numbers of their home addresses. That is, these things are also strongly inherited.

  10. chrislawson says

    And another thing…using data from 23andMe and other genealogical DNA companies is fraught with its own sets of biases, most notably because these companies are not primarily doing scientific research (to be fair, they’re not claiming to), so there is no quality control on the correlating data. They don’t even know for sure that the person sending in a sample is the person it was collected from! So the very fact that some of these studies use 23andMe as a data source shows that they’re not serious about reliability.

  11. chrislawson says

    Matt G @5–

    I used to use that paper to teach students about p-fishing (aka p-hacking).

  12. chrislawson says

    Final observations–

    Non-specific SNP testing is a very powerful tool for tracking lineages and finding known mutations, but almost nothing else. For readers unfamiliar, SNPs are one-point variations in a genome For example at a specific base pair on a given place in the genome, you might find that 45% of people have a G, 20% have an A, 10% have a T, and the remander have a C. In any person’s DNA sample, you will end up with 4-5 million individual SNP sites with known variations. In most genealogy tests, they check around 700,000 of these. Which makes it statistically almost impossible to share 50% of your SNP profile with anyone who isn’t a first-degree relative (or second-to-third degree with inbreeding). Great for researching family trees. SNPs are also very useful for identifying genetic conditions when we already have the data showing that specific SNP is causal to the condition, e.g. the spot mutations that cause cystic fibrosis.

    But when the SNPs are not already known to be specific to a condition (not necessarily a disease, it can be cilantro tasting soapy, or even a neutral variation), then they are useless for hypothesis testing. They can be very useful for hypothesis generation. That is, ‘we found a correlation here, now let’s see if it’s meaningful’. But the sociogenomics people seem to ignore the second part of that sentence.

    And finally, a note about genealogy testing using SNPs. As I said, great for tracking specific relations in a family tree because it’s literally nothing more than counting up the matches between SNPs with no interpretation required for what those SNPs mean. But it’s way overblown for testing your ethnic origins.

    This is another mark against using private genealogy company data — these companies are in the business of making people feel good about themselves, so they are known to have overstated the likelihood of having popular ethnicities in their analyses.

    Essentially, SNP testing is a very powerful tool, but as soon as you start assigning meaning to SNP profiles, you have a lot of hard work to do to establish that meaning as real and not spurious.

  13. kome says


    Oh, you should look into the Minnesota Study of Twins Reared Apart (MISTRA) nonsense. The researchers behind it have, multiple times over the years, made the claim that even alleged environmental contributions to similar scores on measures of intelligence by monozygotic twins were really genetic contributions because “their identical genomes make it probable that their effective environments are similar,” and “the environments of individuals are significantly fashioned by their genotypes.”

    It’s genetic reductionism and determinism all the way down, and all in service to eugenics because that is literally the only thing that benefits from this kind of approach to doing science. Ultimately, eugenics is all any of this has ever been or will be about, whether it calls itself eugenics, sociobiology, evolutionary psychology, behavioral genetics, or sociogenomics. As the biologist-turned-philosopher Massimo Pigliucci said in a response to a paper advocating race realism: “Of course, anyone who has seriously looked into this endless debate [about the biological basis of race] knows very well that here is where the stakes really lie: it is not about small genetic differences that may or may not help build a more individualized medicine; it is not about forensic anthropologists and how well they do their work; it is about claims that one race has superior or inferior intellectual capabilities than other ones…”

    And too many of us in the scientific community – a community that is disproportionately cishet, able-bodied, white men from a middle- or upper-class background – simply accept it without question, because gosh darn it all to heck, those are the same people that research always concludes are the most superior of all of us. Funny how that works.

  14. StevoR says

    Advocates of sociogenomics envision a prospect that not everyone will find entirely benevolent: health “report cards,” based on your genome and handed out at birth, that predict your risk of various diseases and propensity for different behaviors. In the new social sciences, sociologists will examine the genetic component of educational attainment and wealth, while economists will envision genetic “risk scores” for spending, saving, and investment behavior.

    Without strong regulation, these scores could be used in school and job applications and in calculating health insurance premiums. Your genome is the ultimate preexisting condition.

    Sobasically the world of GATTACA. ( )

  15. drew says

    Whether it’s bunk or not, the field will be buried by ChatGPT, an engine smarter than all of us at making sometimes plausible connections.

  16. says

    drew has a (scary) point: “sociogenomics” is nothing more than finding whatever correlations one wants to find, which ChatGPT can easily do — forever, and forever building on its own previous iterations. And even if human experts could distinguish a ChatGPT essay on the subject from a human-written one, it won’t make any difference in terms of the actual validity of the arguments made.

  17. Kagehi says

    Yeah. Kind of two minds about this sort of thing. The problem, as I see it, is that they stop half way, and really can’t go any farther with it. Basically, “Yep, we have found correlations, but we can’t monkey with individual genes in an actual experiment, to see if we get the result we want, so we will just assume we have found a true correlation.” This doesn’t mean they haven’t, but it also doesn’t meant they have found a damn thing at all.

    On a side note.. I had an odd thought on something I read recently about long Covid, and the fact that some other viral infections can cause similar long term results (and the interesting fact that these results show up in some form in Chronic Fatigue Syndrome, and even the semi-dubious Fibro Melagia – or how ever you spell that). What occurred to me is that fatigue could be caused by a malfunction in regulation of sleep cycles, and pain is something that happens “in the brain”, not localized to the receptors, and is, I assume, handled by the same “more primitive” parts of the brain that sleep cycles and how awake you are regulate. So.. two possibilities come to mind: 1) The viruses are themselves messing something up in the machinery of that part of the brain, causing these symptoms in some people. 2) All of them are being caused by some class of autoimmune problem, which is a result of the specific markers on brain cells from that specific part of the brain being misidentified as similar to the viral agent the body just fought off. These can even explain the wide variance in effect, from mild to severe, since the level of response an individuals body has to such a malfunction can be fairly extreme, from mild, or debilitating.

    It would also explain why, while its starting to be recognized that these symptoms, at least as related to prior infections (and I suspect we may find that the Fibro watzit stuff is the same thing), they have been unable to find a clear cause for them yet.

    But, this would be an example of the same sort of science being done “right” – there is a wide number of symptoms, including those associated with unidentified diseases, which where initially rejected as nonsense, the correlation of which was found, only after Covid hit, as “likely caused by viral infections”. We still don’t know the mechanism, we don’t know if all of them have that cause, but there is mounting evidence that it is likely, and someone is bound to, at some point, find the mechanism, now that we know what we should be looking for – i.e. what the virus, or possibly immune system while fighting the virus, did to cause it.

    But, as I said – this is good science. Just finding a possible genetic correlation, and then going, “Yeah.. Can’t test that. Guess we just assume we have the right result!”, is not.

  18. chrislawson says


    You don’t need human genetic engineering studies to work out the function of genes. All of the conditions listed in the OMIM database were worked out without them. Working out gene function is a big job, even in simple cases like profoundly single-gene conditions, and the identification of candidate genes/mutations is the easy part. (And even then, the sociogenomics people are not doing the ‘easy’ part right as they’re trawling the general population.)

    When I say ‘easy’, I mean compared to the work that follows.

  19. Kagehi says

    @18 I think you missed my point – even if their studies where useful, they would only be useful if “testable”. Merely finding a seeming correlation, then leaping to the conclusion it must mean something is fundamentally flawed. However, as I said, “IF” there is a testable hypothesis, the means by which you determine what to look for is bound to be based on some combination of, “Do the genes I want to look at have anything to do with X in the first place, based on what we already know?”, AND, “Does there seem to be an anomaly some place in those genes, in people with X trait?” The complaint “both” of use have with these sorts of studies is that they do the second part, conclude they must be right without testing, and can’t even show if the gene(s) in question have any plausible connection.

    Its not impossible for them to find something this way, but it is impossible for them to test it, and it is, very much, in a sense a form of, “Just because you are incidentally correct, doesn’t make your method invalid, and your assertions still a lie.”