I just got a copy of this paper in my email, straight from Santa Claes, and it’s a good thing, because when I checked our library didn’t have a subscription to PNAS NorthPole. I think it was sent to me because I’ve been such a good boy this year (oh, you didn’t get one? We’ve found the naughty children, then!)


They’ve associated a 7-character amino acid sequence (for instance, FALALAA or NAVIDAD) with a common Christmas carol (“Deck the Halls” or “Feliz Navidad”), and searched GenBank for all instances, and they’re calling this the Carolome. I know, that’s all Jonathan Eisen wanted for Christmas was another omics word.

The most recent version of the public genome database – GenBank – contains as of June 11, 2010 close to 3x 1011 base pairs. In line with studies attempting to identify all proteins derived from the database (proteomics), all metabolites (metabolomics) and all genes (genomics), we here have made a concerted effort to systematically identify all Christmas carols deposited in the sequence data. We here name this field of research Carolomics. The most abundant entry in the Carolome is ‘Deck the Halls’ (Deck the halls with boughs of holly, Fa la la la la, la la la la). We find this carol in 21 genomes. The second most prevalent carol in the Carolome is ‘I Saw Mommy Kissing Santa Claus’ found in 17 genomes including that of the wine grape suggesting a genetic link between mulled wine (aka Glögg) and Christmas celebration. Third most common carol in the Carolome is Ave Maria with 12 identified locations in the GenBank genomes. These findings establish a direct role for Christmas carols in the functional imprint and transfer of genetic information. In the future it will be essential for researchers to determine the presence of carolomes in sequence data; both to increase identified database constituents as well as to more fully and completely understand the proven transference of meme data between genomes.

Now maybe this means there’s a little bit of Christmas in all of us, except…when I scanned through the list of organisms carrying carol-associated sequences, I noticed a marked shortage of human sequences. In fact, none were listed at all. The christmasy organisms mostly seem to be bacteria, with a few fungi and protists thrown in, with one exception: “Ave Maria” seems to turn up in pigs and rats.

There is another little problem with the analysis. They used HTLCALI (“Hotel California” by the Eagles) as a negative control (it was found nowhere). It’s a serious flaw in the amino acid code in that there is no reasonable way to encode a 7 letter sequence for BAH HUMBUG, since you can’t use B or U.