A genomic X Prize

Here’s a marvelous idea: a race to sequence 100 people’s genomes in 100 days, with a nominal prize of 10 million dollars. As a tool to motivate the discovery of new technologies and gain prestige, I approve. It’s unfortunate that it is so anthropocentric, though. A similar contest to sequence 100 species genomes in 100 days would be much cooler, and would contribute far more to our understanding.

They’ve also got a second 100 genomes to sequence that will be drawn from a pool of celebrities. I have reservations there; the ones named seem to be mainly people who happen to be filthy rich (i.e., likely to donate money to feed their vanity), rather than ones that have some biological interest. If you’ve got to pick a celebrity, go for ones with specific physical attributes that will generate potentially interesting comparisons: what about sports stars and chess champions?

Of course, what defeats the whole intent of this contest is that they ought to just hand the samples to that technician on CSI, and he’d whip out the whole shebang in a half-hour.


  1. says

    they ought to just hand the samples to that technician on CSI, and he’d whip out the whole shebang in a half-hour

    counting down till Steve’s head explodes: 5…4…3…

  2. says

    I think this is a good idea, but the article has so much weirdness in it…

    Minute variations in the spelling of DNA letters throughout our genomes account for why people look different, and why some are prone to certain diseases.

    How do you misspell a letter?

    Mr. Diamandis says the second batch of 100 volunteers, known as the “Genome 100,” will be chosen and announced over time and will include ordinary people as well as celebrities.

    Are they hoping to find a polymorphism associated with fame?

    That means reading a person’s entire genetic code — which consists of six billion DNA letters arranged on 46 chromosomes — is still too expensive to undertake routinely.

    Actually, I really like this summary of the problem. Yes, we’re diploids, and heterozygosity matters.

    It will “attract teams from outside the stovepipe” who may make unexpected breakthroughs, he says.

    Interesting metaphor. What the hell is “the stovepipe”?

    The $10 million purse is being put up by Stewart Blusson, a Canadian geologist involved in discovering a trove of diamonds south of the Arctic Circle in 1991.

    There’s quite a bit of territory (even just within Canada) that’s “south of the Arctic Circle”. I guess he wants to keep the exact location of his find a secret.

    Hopefully, this competition will result in spin-off technology useful for sequencing other species, since I agree that 100 species would be vastly more useful than 100 individuals of one species.

    One more step down the long road to the pocket sequencer.

  3. ronpar says

    The problem with 100 species is that you would have to come up with a list. Otherwise someone could just sequence 100 bacterial genomes. (Not that having that information would not be interesting).

  4. Martin Corcoran says

    Its highly likely with current technologies that this will be won by a directed PCR product sequencing strategy rather than a random cloning approach as 100 human genomes (where we already know where to put the primers – we already have the sequence) will not require a great deal of redundancy.
    One human genome is really just the total sequence of 3 million 1 kb PCR products. Any unknown genome would require far more work as it needs subcloning and far more redundancy. Obviously technologies will advance but I think the first past the post in this one is likely to be a production line PCR and sequencing effort.

  5. lo says

    Is this for real. Anyhow PZ there`s a lot of money to be made in the sequencing business and the business is more than self sustained motivation wise. Moreover this is one of the prime examples where layman or even smaller study groups can`t contribute much unlike for instance the Grand Challenge – which IMHO was an excellent idea.

    Also it is always a matter of the quality of sequencing (which sadly nowadays means the average size of the DNA fragments you can sequence at once). Unless you can sequence Mpb at once with high fidelty the shotgun approach alone doesn`t do us much good – which u too know as well.

    At last you as a biologists argue of course for species, whilst from a medical perspective 100 sapiens are incredibly worthy in terms of discovering new approaches to drug delivery design so to account for highly polymorphic regions especially a matter when it comes to MHCs and the immune system at large.

    And i think you are kinda unfair on CSI there, i haven`t seen them use anything other than standard assays. Rather bash them for the now ubiquitous hollywood infinity zoom option. It took NASA several months with resolution integrative algorithms to roughly triple the resolution of the discovery accident tape what takes Hollywood`s actors seconds and they not only triple the resolution of the original but googleplex it :D
    What a feat. Even Jesus and God can only pull that off within the confinements of the Uncertainty principle.

    Anyhow it`s a shame given that those many ill-believes is the basis of the populace`s judgement against potentially good decisions proposed by scientists who sadly are ALWAYS in a minority position.

    Personally if anything i see the future of sequencing lying in nanotechnology. The DNA being “read out” by a modified polymerase attached to a matrix wherein many of those polyermases are attached. When they copy they react with a functional group with a bioluminescent marker once the activation energy is provided by bonding to the poshpate-surgar backbone. There would be three requirments, a highly homogenous and cooled enviornment in order to slow down the molecular kinetics. Next feat would be to have the DNA strands attached to the matrix with the polyermases and start them at the same time. Moreover to stop at given times to resync them. The signal is an superposition of all the molecules reacting during the readout process and can then be processed by signalling software by fourier and laplace transform to filter out those few strands which are pure noise – that is completely out of sync – coz an almost digital-like signal is rather wishful thinking but not achievable.

    Anyhow that`s how i imagine the future in sequencing. But who knows nature is just as wonderous as it was thousands of years back. We could even have in situ sequencing in a decade by some major breakthrough. Who would have thought of RNAi 50 years ago.

  6. says

    Martin, this problem will be offset by the often smaller genomes of other species. Of course, this assumes the 100 species will reflect the biosphere’s genetic diversity, which requires them to include many bacterial and protist species.

  7. Heterocronie says

    What species would be at the top of your list PZ? I assume you would pick animals (cephalopods)? Riftia or sponges would be cool, but my votes would have to be for archaea, anaerobic microbial eukaryotes, or choanoflagellates. By the way, anyone know how many prokaryote genomes the Venter Institute can sequence in 100 days?

  8. Stoic says

    “Of course, what defeats the whole intent of this contest is that they ought to just hand the samples to that technician on CSI, and he’d whip out the whole shebang in a half-hour.”

    In a darkened, half-lit “laboratory” no less.

    Really, can’t they afford some lighting for that show?

  9. M says

    Well if we’re going to start with other genomes, I’d vote for a reptile next. Better still, get Kent Hovind to provide us with some DNA from one of his pet dinosaurs (they are just lizards that have grown up arent they ?), perhaps that vegetarian T-Rex he likes to go on about.

  10. lo says

    TAW everyone has at least one genetic disease – at the DNA level, but in most cases it doesn`t manifest and is rather compensated for by transcriptional feedback and many other mechanisms in place. So yeah the persons to sequence will certainly be chosen wisely, but disease itself isn`t such a stringent requirment as far as the value of the persons sequence goes. Of course it does manifest indirectly in that any claim of a broadband drug with 100% efficacy should be made on the podest of a church only and not put into the ad space of various magazines or other media.

    It really is the “differential picture”, which just doesn`t exist untill a certain number of people is sequenced that is sooo valuable.

  11. Martin Corcoran says

    Related to this is the recent report in Science ‘The consensus coding sequences of human breast and colorectal cancers’ In that study they sequenced the coding sequence of most human genes in 22 cancer patients – a total of 21 Mb per individual. That was enough to throw up a lot of interesting results. 100 complete human genomes would be interesting to compare but remember its probably 100 times the sequence we need do to uncover the vast majority of phenotypic and disease susceptibility variation.

  12. Man of the Sloth says

    A 100 species? Don’t let PZ decide which are on the list. It would look something like this:

    Argonauta argo
    Argonauta boettgeri
    Austrarossia antillensis
    Austrarossia australis
    Austrarossia enigmatica
    Bathypolypus arcticus
    Bentheledone albida
    Benthoctopus abruptus
    Benthoctopus berryi
    Benthoctopus canthylus
    Benthoctopus clyderoperi
    Eledone caparti
    Eledone cirrhosa
    Enteroctopus dofleini
    Euprymna albatrossae
    Euprymna berryi
    Graneledone antarctica
    Graneledone boreopacifica
    Graneledone challengeri
    Haliphron atlanticus
    Heteroteuthis dagamensis
    Heteroteuthis dispar
    Idiosepius biserialis
    Inioteuthis capensis
    Japetella diaphana
    Neorossia caroli
    Octopus (Abdopus) abaculus
    Octopus (Abdopus) tonganus
    Octopus adamsi
    Octopus aegina
    Octopus alatus
    Octopus alecto
    Octopus alpheus
    Octopus araneoides
    Octopus arborescens
    Octopus areolatus
    Octopus aspilosomatis
    Octopus australis
    Octopus balboai
    Octopus berrima
    Octopus bimaculatus
    Octopus bimaculoides
    Octopus bocky
    Octopus briareus
    Octopus brocki
    Octopus bunurong
    Octopus burryi
    Octopus californicus
    Octopus campbelli
    Octopus carolinensis
    Octopus chierchiae
    Octopus conispadiceus
    Octopus cyanae
    Octopus defilippi
    Octopus dierythraeus
    Octopus digueti
    Pareledone charcoti
    Rossia antillensis
    Rossia australis
    Rossia bipapillata
    Rossia brachyura
    Rossia bullisi
    Rossia enigmatica
    Semirossia equalis
    Sepia aculeata
    Sepia acuminata
    Sepia andreana
    Sepia angulata
    Sepia apama
    Sepia appellofi
    Sepia arabica
    Sepia aureomaculata
    Sepia australis
    Sepia bandensis
    Sepia baxteri
    Sepia bertheloti
    Sepia bidhaia
    Sepia braggi
    Sepia brevimana
    Sepia carinata
    Sepia chirotrema
    Sepia confusa
    Sepia cottoni
    Sepia cultrata
    Sepia dannevigi
    Sepiadarium auritum
    Sepiadarium austrinum
    Sepia dollfusi
    Sepia dubia
    Sepia elegans
    Sepia elliptica
    Sepia elobyana
    Sepia elongata
    Sepiella cyanea
    Sepiola affinis
    Sepiola atlantica
    Sepiola aurantiaca
    Sepiola birostrata
    Thaumeledone brevis
    Vosseledone charrua

  13. lo says

    Chris, i haven`t read the requriments but i am pretty sure one of the primarily requirments is out of the box thinking as mentioned in the article.

    We are not a long way off – we really can`t predict anything yet, nor does the DNA sequence alone help us much.

    I think one of the grandest challenges would be a snapshot of a complete preperated cell, scanned by an Atomic force microscope at various layers and then try to make sense out of the landscape that would open before us. I am certain that just by that picture alone we could discover some completely novel and yet unnoticed molecular mechanisms.
    This too would be a mammoth project but it would be worth it IMHO.

    It is the climbing of new mountains that brings us forward the most not of revisiting old ones. Sequencing itself is such a viable and potent business that i think the price is peanuts and doesn`t speed up much. Nor does it increase the number of students, professors, teachers whatnot.
    But one good thing is PR – for both sides.

    Way more important is a fast method for sequencing the epigenome, raising the publics interest in systems biology and so forth. Sadly the initiators of the project aren`t exactly all to versed in biology and their awarness is rather post-DNA era.

  14. says

    By the way, anyone know how many prokaryote genomes the Venter Institute can sequence in 100 days?

    Yes, theoretically. The Joint Sequencing Center which serves TIGR and the other institutes afilliated with the VI can do about 100,000 reads/day. A typical bacterial genome needs about 50,000 reads to get decent 8x coverage. So, about two genomes per day.

    But it isn’t that simple. The real time consuming part of sequencing a genome is getting it closed completely. That involves running many PCR reactions to close the gaps. This can take months, even for a bacterial genome.

  15. Mark says

    “what defeats the whole intent of this contest is that they ought to just hand the samples to that technician on CSI, and he’d whip out the whole shebang in a half-hour.”

    Yes, and with a groovy, hip techno soundtrack to go along with it.

  16. AJ says

    I have reservations there; the ones named seem to be mainly people who happen to be filthy rich (i.e., likely to donate money to feed their vanity), rather than ones that have some biological interest.

    I dunno: any characteristics associated with building and maintaining wealth sound pretty interesting to me.

    PS: It’s 100 genomes in 10 days, not 100.

  17. says

    The are soliticing for medically-important disease genomes, as mentioned in the article. It looks like they’ve got a mix of goals here, and the 100 genomes will be a compromise between public interest and scientific importance, and I think the balance is still on the scientific importance side. I wonder how we apply to get sequenced…

  18. says

    You know, the problems with this relate to a point you made about 6 weeks ago about the problems with privately-funded science: it’s all glamor, no discovery. Would a national granting agency hand out 10 million for a technical project of so little scientific significance, or would it reserve the money for sequencing new species, or various strands of the same bacterial/viral species?

  19. Steve_C says

    I wish that could explain the Conservative movement and corrupt republicans.
    But there’s not way that many people got dropped on their heads. ;)

  20. Loren Petrich says

    A possible problem with this prize is that it’s possible to “cheat” with human genomes — use existing human-genome sequences as scaffolds for assembling the sequenced genomes.

    But just the same, the techniques necessary for sequencing a human genome a day for $100,000 or thereabouts will be useful for sequencing other species’ genomes — there are a lot of gaps in current sequencing efforts’ coverage. There are more mammals being sequenced than non-mammalian vertebrates, for instance, and I don’t think any non-insect arthropods or mollusks or annelids have been sequenced. And while numerous prokaryotes have been sequenced, not many protists have been sequenced.

  21. says

    Man of the sloth, you forgot Architeuthis dux.

    Coin, thank you, that answers my question (partly). So, mispelled letters in the human genome would be… novel pyrimidine or purine bases?

    If anybody, anywhere, finds a genetic disorder or disease associated with something other than A-T-G-C, that’s probably grounds for a Nobel. And yes, I’m ignoring the weird letters in tRNA, and Uracil, since we already know about those, and the weird letters in tRNA fall into a grey zone when defining “coding sequence”, in my opinion.

  22. says

    I’m all for encouraging science, and even for monetary prizes for the best work. But I fear that some of these things encourage the flashy and the popular, when all basic science needs support …

  23. sparc says

    In priciple the challenge could be accomplished by current techniques sufficient financing provided. However, one question remains: What resolution is required to jugde a genome as sequenced? I guess there are still gaps in the published human and mouse sequences and one must remember that there were even more gaps when the human genome was first published as being sequenced.

    Sequencing unknown genomes of 100 species would be much more challenging because sequencing 100 human individuals is indeed just resequencing. With emerging technologies like massive parallel sequencing first suggested by Sid Brenner and similar technologies (e.g. http://www.solexa.co.uk/wt/page/tech_approach) it is currently possible to sequence >80% of a human genome in a day provided that sufficient computer memory is available to align the short sequences gained by this method. The remaining 20% are repetitive sequences that can not be properly allocated or due to sequencing resistent fragments. Alternative technologies based on array hybridisation face the very same problem.
    Still, sequencing 100 individual genomes at 80% resolution may tell us something. However, in my opinion it doesn’t make sense to pick 100 celebraties. As suggested in another comment sequencing 100 genomes of patients with genetic diseases (and of course some unaffected relatives) or samples from different ethnies would be more worthwhile.

  24. sparc says

    One thing I must add. It is quite impressive how Sanger sequencing has improved over the times. Those younger biologists who are used to just send their samples to a sequencing department can not imagine what an effort is was back in the late 80s and early 90s to achieve a good sequence read of 300 bp. It was a real success when you finished your day with a properly casted sequencing gel on which you could load your samples the next day. If after several days of exposure you’ve had proper bands on the autoradiogram it was time for a lab party. Indded inthe beginning there weren’t even proper gel loading tips available. So many people used tiny glass pipettes and mouth pipetting via some rubber tube to apply the samples to the gel which may have not been a to god of an idea due to use of 35S labelled nucleotides. When chain termination failed one had a try with Maxim/Gilbert chemical sequencing (is anyone still using this technique?).
    Sequence reading meant literally reading and typing the sequence on a computer’s keyboard.

    Sequences alignment and analyses required computers as big as refrigerators and took so long that one could have at least two cups of coffee and to attend a seminar. Longer alignments made it possible to go shopping in the meantime. At the end of the day you still had the impression you had been working hard, especially when you printed the result. This took another hour. So it was easy to give your boss the impression that you’ve been working late. However in many cases the computers crashed, quite often in a way that it was impossible to restart my session so that I had to ask somebody from the computer department to restart the engine (most of them being physicists, they never understood what could be so interesting about putting sequences together).

    Doesn’t sound too funny. Well, these were the most exciting times I have spent in the lab and if it was possible I would suggest a third Nobel award for Sanger.

  25. Peter Z. says

    I actually did some sequencing the old school method as a part of a practical for uni 3 years ago. I think my class in the only one in the UK that still does it…

    I don’t think the computer situation has really changed, machines have got faster, but datasets have grown and software got even buggier…

  26. John B says

    The aim of this project is to provide an incentive for the development of technology that can provide rapid and reliable whole genome sequencing. As it states in the article this development is important in the field of pharmacogenetics for the introduction of personalised medicine in to a routine clinical environment. Hence the choice of 100 human genomes and not genomes from 100 different species. The data generated by this project (i.e. the 100 sequenced genomes) is secondary.

  27. lo says

    hold it guys! It ain`t possible that anything other than a pyrimidine is present within the DNA, by design. Artificially it certainly possible not in vivo where dozen of enzymes are involved in the regulation and copying process. If this enzymes don`t work than it wouldn`t even have come to the first cell division.

    But pyrimidines and purines can be modified of course but they still stay either pryimidines or purine nitrogen bases :)

    So the process of discovering excludes something that is allready well known down to the last detail such as the DNA. With molecular modelling it is pretty simple to even get an idea what is possible and stable enough. E.g. some argue that PNA was the precursor of replicating molecules (i am not an advocate of the PNA prebiotic theory, for reasons i won`t go into).

    So a self replicating molecule based on PNA would be worth a nobel price. Many people forget what the Nobel Prize is really all about, not about the grandest and greatest and smartest and blodest and most assiduous people but those few people who discovered something that revolutionized science.

    So i heard even someone say he would Sanger give even another Nobel just for the heck of it. That`s bull.

  28. lo says

    Oh yeah Peter Z., that`s not fair. You know too little about software and hardware design itself. Design have changed fundamentally nowadays especially software design. In principle you can design with some external libraries a completely crash safe application where every address space is predicted aforhead the assignement that is exception handling along with the try -clauses that most compilers already include as standard. Moreover the OS usually can run decades now without crashing under standard situations.

    I can`t imagine how it was back in the days, but it really ain`t that bad nowadays, especially scientific applications usually are made very stable if employed broadly, like mathcad.

    As a general rule of thumb when we as human individuals claim something it is ALWAYS not an assertion that this is just that way but in fact that we ourselved are ignorant and don`t did the research of the millions of text lines available to any really any topic nowadays.

    As another rule of thumb every invention, every idea has been thought as well by someone else, it is only the ones who are influential enough to be able to publish and get credited for the invention. Of course there is always a first when it comes to making an idea practical. But the social capacity of human ideas and thoughts is unlimited.

  29. windy says

    The aim of this project is to provide an incentive for the development of technology that can provide rapid and reliable whole genome sequencing. As it states in the article this development is important in the field of pharmacogenetics for the introduction of personalised medicine in to a routine clinical environment. Hence the choice of 100 human genomes and not genomes from 100 different species. The data generated by this project (i.e. the 100 sequenced genomes) is secondary.

    Actually no, a lot of genomes are going to be needed to find all the variable bits and connect them to anything meaningful, so those 100 genomes would be very useful for developing personalised treatments. Genomes of healthy celebrities from a possibly narrow ethnic background perhaps less so.

    Of course if the project leads to the development of a “pocket sequencer” or something similar, collecting data on human genomic diversity later will be trivial. But once the data is in, and we know where most of the interesting bits are, why routinely sequence whole genomes? So it seems to me that getting data on different genomes would be more important at this point than being able to sequence a whole genome extremely fast.

  30. Icequeen says

    the DNA of 100 famous people? Ooh the potential for Maury Povich levels of drama is almost tangible.

  31. John B says

    Windy: I agree, data from a large number of individual genomes will be required to identify variation responsible for disease susceptibility. Robust genetic epidemiology studies require sample sizes in the thousands to acheive sufficient power. Hence the need for this kind of high throughput technology.

    So it seems to me we need to develop rapid technology at this point to help us collect data from a sufficiently large sample group on a feasible timescale. The technology will in turn enable us to collect the data to identify the interesting bits from lots of different genomes.

    The data generated from the xprize project will have some value, but the primary aim seems to be an incentive for the development of the technology.

    You ask ” Why routinely sequence whole genomes?”. Well, why not? If the technology is available and it is economical to do so, it would make sense to capture all possible variation within an individual. Imagine this level of genetic information coupled with the prospective collection of clinical data on a population scale, it would be an awesome tool.

    I have no strong feelings on the celebrity aspect, except to suggest is may be an attempted appeal to pop culture to improve media coverage.