Biology is a hard problem

New genetic disorders pop up all the time — each one represents a child who may face incredible challenges, or even be doomed to death. A child named Bertrand exhibited some serious symptoms — profound developmental disabilities — shortly after he was born, and no one could figure out what was wrong with him. So they took advantage of 21st century biotechnology and sequenced his genome, and the genome of both of his parents, and asked what novel mutations the child carried.

For years, sequencing was too expensive for common use—in 2001, the cost of sequencing a single human genome was around a hundred million dollars. But by 2010, with the advent of new technologies, that figure had dropped by more than ninety-nine per cent, to roughly fifty thousand dollars. To reduce costs further, the Duke researchers, including Shashi and a geneticist named David Goldstein, planned to sequence only the exome—the less than two per cent of the genome that codes for proteins and gives rise to the vast majority of known genetic disorders. In a handful of isolated cases, exome sequencing had been successfully used by doctors desperate to identify the causes of mysterious, life-threatening conditions. If the technique could be shown to be more broadly effective, the Duke team might help usher in a new approach to disease discovery.

For their study, Shashi, Goldstein, and their colleagues assembled a dozen test subjects, all suffering from various undiagnosed disorders. There were nine children, two teen-agers, and one adult; their symptoms included everything from spine abnormalities to severe intellectual disabilities. The researchers began by sequencing each patient and both biological parents—what’s known as a parent-child trio. There are between thirty and fifty million base pairs in the human exome; the average child’s exome differs from each of his parents’ in roughly fifteen thousand spots. The researchers could dismiss most of those variations—either they corresponded to already known conditions, or they occurred frequently enough in the general population to rule out their being the cause of a rare disease, or they were involved in biological processes that were unrelated to the patient’s symptoms. That left a short list of about a dozen genes for each patient.

In Bertrand’s case, they narrowed it down to one likely gene responsible for his condition — one gene that they also found that each of his parents carried variants for, although paired in both cases with normal functional alleles. Bertrand was unlucky: he inherited one bad copy from his mother, and another bad (but different) copy from his father.

Then there was Bertrand. The Duke team thought it was likely that mutations on one of his candidate genes, known as NGLY1, were responsible for his problems. Normally, NGLY1 produces an enzyme that plays a crucial role in recycling cellular waste, by removing sugar molecules from damaged proteins, effectively decommissioning them. Diseases that affect the way proteins and sugar molecules interact, known as congenital disorders of glycosylation, or CDGs, are extremely rare—there are fewer than five hundred cases in the United States. Since the NGLY1 gene operates in cells throughout the body, its malfunction could conceivably cause problems in a wide range of biological systems.

The article points out that one of the things that has made tracking down the genetic cause of this disorder is academic competition. Lots of people are born with novel genetic disorders, and they go to their high-powered geneticist/MD, and they get parts or their entire genome sequenced, and then the sequence is kept private. This is now the doctor’s discovery: making it open knowledge would also make it likely that someone else would use it and publish it, and that they wouldn’t get credit for it. That doesn’t help patients, but it does help careers.

And that’s the next step. It’s clear that Bertrand has an anomalous form of NGLY1, but that doesn’t demonstrate that that is the cause (remember, he’s got 15,000 other variations from his parents’ genome). The clincher would be to find other kids with similar phenotypes who also had NGLY1 variants, and then you’d be relatively certain you’d found the cause. If you had lots of sequence data, you might also find people who had the NGLY1 variants but none of the disease symptoms, which would rule out NGLY1 as the cause. It’s a real problem that information gets locked up in little academic kingdoms, and is difficult to pry out without promising authorship on a paper…and who wants to be the 63rd author on a paper that has 200 contributors, anyway?

So the article ends up pointing out a flaw in poor Bertrand’s genome, and another flaw in the institution of science.

I have to point out another problem, though, and this one has been known about in genetics for a long, long time: the high visibility of mutations of large effect, and how they skew our perception of how the genome works. These mutations exist, and Bertrand’s case is an excellent example: a single point mutation wreaks global havoc on the system, causes profoundly disruptive symptoms, and draws a bulls-eye around itself to attract the attention of geneticists. But the overwhelming majority of allelic variants do nothing detectable at all — again, witness Bertrand’s 15,000 differences that were ruled out as causal — yet we can’t rule out the possibility in other genetic disorders that multiple genes are required to be messed up to trigger the problem, and that focusing on them just one at a time means you miss the causes.

We know this is the case in cancer, for instance. There are central players that frequently end up mutated to cause oncogenesis — myc, ras, and p53, for instance — but no cancer is caused by just one genetic change, and it requires multiple steps to initiate. Further, there are multiple components, each with their own likely cause: proliferation is different from suppression of apoptosis is different from metastasis, and every patient has a different genetic profile. That’s why you’re not going to find any responsible doctor claiming that they found THE gene that causes cancer and have THE cure.

But here’s another example: a large genetic study that used similar techniques to those applied to Bertrand, looking for the heritable cause for a more complex and subtle disorder, schizophrenia. They didn’t find one. They found a hundred.

One clue to this complexity, and how schizophrenia as a disease is "built", has come this week in new research published in Nature which looks at the genetic basis for the illness. In one of the largest genetic studies of its kind, a team of scientists from around the world compared the genomes of 36,989 people with schizophrenia with 113,075 control participants. They identified 128 independent genes in the people with schizophrenia, 83 of which were not known about until now.

Although this is an important study, it would be false to say that genomics work will lead to an imminent breakthrough in terms of a cure for mental illnesses. What we can do with this information is to ask better questions about what to research next in this field, for example some of the new genes identified are involved with immune processes, which provides the first real evidence for a long-held hypothesis that connects schizophrenia with immune system problems.

The medical team studying Bertrand got lucky and found a single gene as the likely source of his problems (Bertrand is not lucky at all, though: that we know what’s wrong with him is a world away from being able to fix him). What makes people tick is a constellation of genes interacting cooperatively with one another, and you generally can’t map single genes to single phenotypic traits.

It’s going to be hard to figure that out. That’s why we need more biologists!


  1. madtom1999 says

    You dont just need more biologists you need OPEN biologists.
    I’ve been working with open source code for 25 years now and if I prayed I’d pray the method spread everywhere.

  2. fergl100 says

    Sequencing is fraught with problems at the moment, but it is still relatively new. Nice article and I’m sure things would be better if all the Geneticists had PZs thoughtful approach – rather than jumping in when an anomaly is found in one gene that could have something to do with the phenotype but equally might not.

    “but no cancer is caused by just one genetic change” – what about specific translocations being diagnostic of specific cancers?

  3. says

    “Diagnostic” is different from “causal”. Having a defective variant of BRCA1, for instance, means you are more likely to get breast cancer, so it’s good to know…but you won’t actually get breast cancer unless there are a series of other mutations in other genes.

  4. Le Chifforobe says

    Can somebody please explain this math to me?

    There are between thirty and fifty million base pairs in the human exome; the average child’s exome differs from each of his parents’ in roughly fifteen thousand spots.

    I have seen the mutation rate in humans cited as anywhere from 1.1 to 2.5 *10^-8 per base per generation. The numbers in the article come to 3-5 *10^-4 in a single generation, don’t they?

    What am I misunderstanding? Thanks!

  5. chris61 says

    @5 Le Chifforobe

    This isn’t necessarily mutation. If mother is AG at some spot and father is CG and kid is GG he/she will differ from each of his parents. There will be plenty of such spots in the human exome. But many and possibly most of the apparent differences will introduced by the technology and the software used to analyze the data. Sometimes a person will appear to be homozygous at a particular location because the quality of the data is such that the second allele isn’t detected.

  6. zetopan says

    50E6 * 2 = 1E8 bases (half from each parent). Multiplying that by 2E-8 gives you 2. Hence in the exome
    you would expect about 1 or 2 mutations per generation. I am unsure what your point was since the child
    inherited a bad gene allele from each parent. Any mutations would be in addition to the child’s inherited

  7. moarscienceplz says

    This isn’t necessarily mutation. If mother is AG at some spot and father is CG and kid is GG he/she will differ from each of his parents.

    If that is what they are referring to as a difference, then 15k differences out of 30M base pairs sounds way too low to me. Unless many BPs are both twinned and identical in both parents, shouldn’t the offspring have different BPs from both parents most of the time? For example, if the mother is AT and the father is CG, the kid couldn’t have a match to either parent, right? Even in your example, the kid could match a parent only 50% of the time.

  8. Le Chifforobe says

    @6 chris61:
    Thanks for your reply. I was interpreting the ~1500 differences as novel mutations. Although the quoted sentence simply counts base pairs in the exome, it does make more sense if you consider allelic differences (both parents heterozygous where child is homozygous, or vice-versa.)

    Now I gotta look up the math on that. Not getting any work done today!

  9. chris61 says

    @8 moarscienceplz

    This is the exome after all (i.e. coding for protein) so many bp will be identical in both parents.

  10. dianne says

    That’s why we need more biologists!

    What we need even more than we need more biologists is more funding for biology research. We’ve got biologists quitting the field because they can’t get funding or can only get funding for meaningless, unimaginative projects. We need more money and we need the NIH and NSF to get a backbone and fund some high risk projects. Not to mention they need to start funding the best available project, not the one that their cronies proposed.

  11. dianne says

    Another thing we could use: Better science education for lay people so the average person has some understanding of basic biology and we can stop wasting money proving, yet again, that cell phones don’t cause brain cancer, vaccines don’t cause autism, and laetrile doesn’t cure cancer.

  12. gillt says


    There are central players that frequently end up mutated to cause oncogenesis — myc, ras, and p53, for instance — but no cancer is caused by just one genetic change, and it requires multiple steps to initiate.

    But not all mutations are created equal. One germ line and one somatic mutation is sufficient to grow a tumor, hence the two-hit model of cancer genetics. Moreover, a germline mutation can cause genomic instability and actually increasing likelihood of acquiring the “second hit”. Coincidentally, a two-hit model is also applicable to psychiatric disorders.

    Great post on a great NYer article, btw.

  13. dianne says

    no cancer is caused by just one genetic change

    What about CML? It often has secondary mutations, but is any mutation but the BCR-abl mutation required to make it happen?

  14. Sili says

    (Bertrand is not lucky at all, though: that we know what’s wrong with him is a world away from being able to fix him)

    Don’t some people consider it rude to equate a person with their illness this way?

  15. says


    Unless many BPs are both twinned and identical in both parents

    That is basically correct. For all the attention (justifiably) paid to differences in genomes, the vast majority (>99%) of the human genome (or basically any species) is fixed and largely invariant, i.e. everyone has the same homozygous genotype. Of course, 1% of 3 billion base pairs still leaves a lot of variation to sort through, but your inference that most of the genome must be ‘twinned and identical’ is true.

    Quick edit PZ, first sentence after the second blockquote: “The article points out that one of the things that has made tracking down the genetic cause of this disorder [seems like the word ‘difficult’ or ‘slow’ or something is missing here?] is academic competition.”

  16. moarscienceplz says

    Chris61 and jacobbasson
    Point taken about the exome being highly conserved, but 15k differences out of 30M is still only 0.05% different. And the kid’s disorder itself shows that there are at least three alleles of just that one gene.

  17. moarscienceplz says

    Wikipedia says there are 180,000 exons in the exome, so if the 15k differences are only those found in those exons, that would mean that up to 1 out of every 12 exons could have a BP difference, which I guess could sound reasonable. Still feels too low to me, but IANA Biologist.

  18. says

    Our patient community has experienced the ‘good news/bad news’ phenomenon with exome sequencing. Primary ciliary dyskinesia (PCD) is a rare, heterogeneous disorder which is difficult to diagnose. In the past decade 31 genes (of a total number not known at this point) associated with PCD have been discovered with many PCD-causing mutations on each gene. This complex genetic picture has translated into reporting nightmares with commercial genetic testing vendors. In one case, the report sent to the physician listed ’16 variants of unknown significance’ on a gene panel that only included 12 genes. Some of the commercial labs do a much better job, but the confusion these results cause for families and treating docs is astounding.

    I think what is the most surprising to me is the number of physicians who don’t seem to really understand autosomal recessive inheritance and who are too eager to make the diagnosis based on any mutation on an associated gene and to assume non-sib family members with minor symptoms are also sufferers because it is ‘genetic.’ The only thing more traumatic for families than getting diagnosed with an incurable illness is getting undiagnosed after they finally thought they had some answers and it seems that, for now, our ability to gather genetic data often runs ahead of our ability to properly evaluate it.

    Definitely getting better all the time, though. When my daughter was diagnosed with PCD back in 1991 (on the basis of a biopsy), we were told there would be never be any research into genetic disorders of ciliary function because the conditions were too rare and too genetically complex to justify the expense. The fact that gene identification for many of these rare conditions is now a reality is astounding and promising, not just for diagnosis, but also providing hope for future therapies.

  19. Claire Simpson says

    PZ, I’m a fan but you’ve got this wrong. Comparing the schizophrenia study which was a meta-analysis pf genomewide association studies (GWAS), coming from an international consortium is very different to exome sequencing a trio (2 parents plus affected kid). There is plenty of evidence that the compound heterozygous polymorphisms in NGLY1 found in the Bertrand Might is truly causal. The sequencing approach described is very powerful for finding single genes of large effect, even in the case of compound hets.

    GWAS are a completely different study design, for a very different model of inheritance. GWAS are powered to find common variants of small effect, in disorders which have a complex inheritance model typically believed to include both genetic and environmental factors. The results of GWAS are much more difficult to intepret, and it has been depressing to read the mischaracterization of these kinds of studies.

    Your criticism of the current way that data are shared or not between researchers misses an important nuance of patient confidentiality. There are many statistical genetists, such as myself, who believe that we can no longer pretend that deep sequencing data and large GWAS genotype data sets do not represent personally identifiable information – the genotypes themselves. The strides that have been made in terms of facial reconstruction from genotype data are both amazing and frightening in equal degree. Nonetheless, all NIH funded researchers must deposit their data in the database of Genotypes and Phenotypes (dbGaP) unless there are good reasons for exemption. Most typically, these are due to samples coming from other countries where depositing the data would be a violation of the law of that country. Access to dbGaP is restricted, for privacy reasons but not to a great degree.

    Also, although you deride the idea that accessing data is often done in exchange for authorship, in fact it is this balance that drives my science forward. You have to recognize the enormous amount of money and effort that goes into recruiting patients into studies, especially prospective and cohort studies which follow patients for decades. Asking that anyone who wants to work with that data enter a collaboration with the doctors and scientists who have produced these amazing data sets is an entirely reasonable request and one that has been made more difficult by dbGaP. Yes, you can go get the data from amazing studies like the Framingham Heart Study directly from dbGaP. But if you have questions about the data not available in the database, who do you ask? You can contact the PI of the Framingham study, but what incentive is there for them to be forthcoming? We all of us have too many commitments to balance and asking that researchers give of their time when it doesn’t benefit their career is unfair. If you’re on the tenure-track, you simply cannot afford to be so generous, you will get no credit for it when you’re up for tenure. And if you don’t get tenure, well you’re kinda out of a job. Who does that benefit?

  20. Tigger_the_Wing, Back home =^_^= says

    My own rare-ish genetic condition, Ehlers-Danlos Syndrome, can be caused by mutations in any/all of eight genes, and affects pretty much my whole body (everywhere that collagen is present).

    Another of my conditions, ankylosing spondylitis, is associated with a particular gene variant, but isn’t necessarily caused by it.

    After reading the OP, I thought “No wonder it is difficult to diagnose multi-gene disorders, if the process is that complicated.”