Why would anyone want a complete simulation, anyway?

The NY Times is touting a computer simulation of Mycoplasma genitalium, the proud possesor of the simplest known genome. It’s a rather weird article because of the combination of hype, peculiar emphases, and cluelessness about what a simulation entails, and it bugged me.

It is not a complete simulation — I don’t even know what that means. What it is is a sufficiently complex model of a real cell that it can uncover unexpected interactions between components of the genome, and that is a fine and useful thing. But as always, the first thing you should discuss in a model is the caveats and limitations, and this article does no such thing.

I’d like to know how fine-grained the model is; I get the impression it’s an approximation of interactions between molecular components based on empirically determined properties of those elements. Again, I don’t think the authors have claimed otherwise, but it’s implied by the NY Times that now we have an electronic simulation that we can plug variables into and get cures for cancer and Alzheimer’s, without ever having to dirty our hands with real cells and animals anymore.

That’s nonsense. Everything in this model has to be a product of analyses of molecules from living organisms; they certainly aren’t deriving the functions and interactions of individual proteins from sequence data and first principles. We can’t do that yet! The utility of a model like this is that it might be able to generate hypotheses: upregulating gene A leads to downregulation of gene Z, a gene distantly removed from A, in the model, and therefore we get a preliminary clue about indirect ways to modulate genes of interest. The next necessary step would be to test potential drug agents in real, living cells. This model will have a huge mountain of assumptions built into it — and you can only build further on those speculations so far before it is necessary to cross-check against reality.

Also, isn’t it a bit of a leap to jump from a single-celled, parasitic organism like M. genitalium to human cancers and brain disease? Yet there it is in the second paragraph, a great big bold exaggeration.

And then there’s the really weird stuff. Some people need to step back and learn some biology.

“Right now, running a simulation for a single cell to divide only one time takes around 10 hours and generates half a gigabyte of data,” Dr. Covert wrote. “I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds. We often think of the DNA as the storage medium, but clearly there is more to it than that.”

What the hell…? Look, I could (if I had the skills) generate an hourglass simulator that calculated the shape and bounciness and stickiness of every grain of sand, and stored the trajectory of each as they fell, and by storing enough data for each grain, generate even more than half a gigabyte of data. So? This doesn’t mean that an hourglass is a denser source of information than a cell. The storage requirements for the output of this program do not tell us “how much data a living thing truly holds” — that statement makes no sense.

As for “We often think of the DNA as the storage medium, but clearly there is more to it than that”…jebus, does a professor of bioengineering really need to go back and take some introductory cell biology courses, or what? Heh. “More to it than that.” I’m glad to see that someone needed an elaborate computer simulation to figure that out.

I am, for some reason, reminded of the time I attended a seminar by a computer scientist on an exciting new simulation of the genetic behavior of viruses that I was told would have great predictive power for epidemiology. One of the first things the speaker carefully explained to us was how they’d incorporated sexual reproduction into the model. I wish she’d waited to the end to say that, because it meant that I sat there listening to the whole hour talk with absolutely no interest in any other details.