DEVELOPMENTAL BIOLOGISTS RULE ALL THE THINGS


I keep telling you guys this, but some of you non-developmental biologists dare to disagree — but it’s true, I’ma gonna tell you, developmental biology is the greatest scientific discipline of all time. We have confirmation, too: The Nobel Prize in Physiology or Medicine for 2012 has been announced, and it goes to two developmental biologists, John Gurdon (about time) and Shinya Yamanaka. It also goes for research in stem cells.

Gurdon carried out the initial crucial experiment years ago. He sucked the nucleus out of an adult frog skin cell and injected it into an enucleated frog egg. What happened next was that in some cases, the nucleus was reprogrammed into a pluripotent state — instead of being a skin cell with its specific suite of active and inactive genes, it was transformed into an egg cell equivalent, and went on to divide and dutifully create a genetic copy of the donor frog…a clone. This was the precursor to all the animal cloning experiments that have gone on since.

Gurdon’s experiment worked, but we didn’t know how — we knew that the environment of the cytoplasm of the egg cell somehow reset the epigenetic state of the skin cell nucleus, but we were blind to what was actually doing the work.

Enter Yamanaka and his colleagues. What they did was figure out what genes constituted the reset button of the cell, and by expressing them could force adult cells to revert to a pluripotent state. Their approach is a kind of brute force global activation of the four genes identified as key triggers in mammalian cells, but it works: adult mouse cells have been transformed, and then go on to develop into clones of the donor, and it also works on human cells, although no full human clones have been produced — just tissue collections in a dish, or teratomas in mouse hosts.

These are important steps in developing tools to allow us to sculpt adult cells into tissues and organs and whole organisms. Notice that the work was done at Cambridge and Kyoto which, for the geographically challenged, are not American universities. There is good stem cell work being done in the US, but it’s hampered by regulations and restrictions that European and Asian universities do not suffer from. Unleash all the developmental biologists, because we must rule all the things everywhere!

Comments

  1. IslandBrewer says

    Booyah!

    Take that all you Xenopus detractors!

    Yes, I did my PhD. in a Xenopus lab, why do you ask?

  2. jand says

    early warning system, apostrophe abuse detected

    … it’s specific suite of active and inactive genes, it was transformed into an egg cell equivalent…

    friendly neighborhood jand

  3. Taemon says

    I’m not a guy.

    But it’s way cool stuff. My brother works in it and sometimes gushes over his embryos. (But then, my brother is a guy.)

  4. jasonnishiyama says

    “I’ma gonna tell you, developmental biology is the greatest scientific discipline of all time.” well almost, we all can’t be astrophysicists… :)

    [running and ducking]

  5. phira says

    Totally awesome. I actually worked with iPS cells at my old job. As much as I don’t think there’s anything wrong with using hES cells (and therefore don’t see iPS as a way to avoid using hES and being damned to hell, etc), being able to take fibroblasts from a patient and turn them into the cell type that the patient needs?

    SCIENCE RULES.

  6. ethicsgradient says

    An excerpt from John Gurdon’s school science report (aged 15):

    …at times he has been in trouble, because he will not listen, but will insist on doing his work in his own way. I believe he has ideas about becoming a scientist; on his present showing this is quite ridiculous… it would be a sheer waste of time, both on his part and of those who have to teach him.

    http://news.admin.cam.ac.uk/news/2012/10/08/professor-sir-john-gurdon-awarded-nobel-prize-in-physiology-or-medicine/

  7. moarscienceplz says

    Question for PZ (or any biologist):
    I’m sure that all the god botherers are going to say this means that there is no longer any reason to experiment on embryonic stem cells. Are they right? If not, why?

  8. Nemo says

    although no full human clones have been produced

    That we know of.

    Unless you believe the Raelians.

  9. TriffidPruner says

    adult mouse cells have been transformed, and then go on to develop into clones of the donor, and it also works on human cells, although no full human clones have been produced — just tissue collections in a dish…

    What a fascinatin’ dilemma this must pose for those who try to relate “soul” to “egg” and ponder on when the “ensoulment” of a foetus takes place.

    OK, so you take a skin cell and you make it so that it could in principle develop into a full human. Which is precisely the quality that makes some say that a fertilized egg is a person. So for them, such a rebooted skin cell ought to be, in principle, a person. But, what soul does it have? Could it share the soul of the donor? Or does god mint a new soul every time a cell becomes pluripotent? Or does god wait to see if the cell will in fact develop into a human before making a soul for it? Oh, but if that were so, then surely god would also wait to see if a fertilized egg from normal source were going to develop, in which case, it isn’t a person after all, just a cell. Oh, worra worra.

    And, who will speak for the civil rights of the tissue clusters in the petri dishes! Who, I say!

  10. Ichthyic says

    Ah, but where would you Devo people be now without us molecular people PZ?

    only slightly lagging, since it would have been realized long ago that to move forward, there would be a need to understand the biochemistry involved in development as well, so if molecular biologists didn’t exist, they would have been invented.

    IOW, it’s nonsensical.

    it’s like saying:

    “Where would biologists be without physics!”

  11. IslandBrewer says

    And where would physicists be without biology keeping them alive, huh? Checkmate, Lawrence Krause!

  12. Owen says

    I took his Part 1B (undergrad) developmental biology course back in the 90’s. I shall now bask in his reflected glory…

  13. johnharshman says

    I keep telling you guys this, but some of you non-developmental biologists dare to disagree — but it’s true, I’ma gonna tell you, developmental biology is the greatest scientific discipline of all time.

    Right, and that’s why Theodosius Dobzhansky said “Nothing in biology makes sense except in the light of embryology.” Oh, wait.

  14. Paul W., OM says

    Nice bit of computer science, this, figuring out a few basic rules in a forward-chaining production system and brutally rebooting it.

    Pretty cool hack.

  15. Paul W., OM says

    I keep telling you guys this, but some of you non-developmental biologists dare to disagree — but it’s true, I’ma gonna tell you, developmental biology is the greatest scientific discipline of all time.

    Right, and that’s why Theodosius Dobzhansky said “Nothing in biology makes sense except in the light of embryology.” Oh, wait.

    I thought it was “nothing interesting makes any sense except as computation.”

    Oh, wait.

    Funny how biologists keep rediscovering natural computational principles that computer scientists reinvented decades ago.

    The genome is literally mainly a computer program. If you understand what computation actually is, it’s not a weak metaphor, but a basic literal truth.

    Evolution is a naturally occurring algorithm for exploring design space, which sometimes finds ways of building computing systems, and sometimes even finds ways of building computing systems that build computing systems, e.g., brains, that can understand computing systems.

    Everything interesting is fundamentally computational.

    And computer science is of course the queen of the interesting sciences.

  16. ChasCPeterson says

    I feel kind of bad for Louis. It’s got to be a drag being stuck with waving the chemistry dick.

  17. Louis says

    I have broad shoulders to bear the weight of the chemistry di…

    …okay, this analogy is taking a disturbing route.

    Louis

  18. Paul W., OM says

    So who is Paul W.? Is he a one-trick pony as he appears here?

    When it comes to disciplinary dick-waving, maybe.

    Sort of a one-dick pony with maybe a half dozen tricks.

  19. johnharshman says

    Ooh, I know that one. It’s me.

    But the question, if it was too subtle for you, was whether Paul W. ever has anything to say besides asserting that the genome is a computer program. And, apparently, his penis size.

  20. Paul W., OM says

    John, Google is your friend.

    Try the obvious combinations—of, say, “paul w.”, OM, and pharyngula, and when you figure out OM and why it follows my name in particular, and you’ll have the answer to your question.

    That’s one reason I put the OM in there—so you can do that.
    The main reason is that there’s another Paul W. as well as various other Pauls and a Dave W. that I’ve been repeatedly confused with.

    (And BTW, I notice with a little googling that the other Paul W. has sometimes commented in the same threads as you at SB, e.g., at Tet Zoo, where I’ve also commented a little bit. Sigh. That usually wasn’t me at TZ. *sigh*. Unfortunately, a lot of my older comments don’t have the OM, so you can’t tell, but most old SB and FTB comments by “Paul W.” at both sites were from me, not him.)

    As for my dick size, hey, others were going on about about dick-waving, and you had to go and insinuate that I was a one-trick pony, and it was just too easy.

    Sorry if I offended your delicate sensibilities.

    Now if you have something substantive to say, by all means go for it.

  21. johnharshman says

    You could have answered my question. OK, you’re a mensch. But do you really think that the genome is a computer program, and that evolutionary biology is just a rediscovery of computer science? Or was that a very clever joke? If intended seriously, it’s the sort of thing that obsessives tend to say. And you stepped on my joke too.

  22. Owlmirror says

    It’s been known for more than a hundred years, ever since Maxwell, that all physical systems register and process information. For instance, this little inchworm right here has something on the order of Avogadro’s number of atoms. And dividing by Boltzmann’s concept, its entropy is on the order of Avogadro’s number of bits. This means that it would take about Avogadro’s number of bits to describe that little guy and how every atom and molecule is jiggling around in his body in full detail. Every physical system registers information, and just by evolving in time, by doing its thing, it changes that information, transforms that information, or, if you like, processes that information. Since I’ve been building quantum computers I’ve come around to thinking about the world in terms of how it processes information.

    Seth Lloyd

  23. Amphiox says

    and sometimes even finds ways of building computing systems that build computing systems, e.g., brains,

    Which go on to build computing systems, which have not (yet) figured out how to build computing systems.

  24. Paul W., OM says

    You could have answered my question.

    What question? Whether I’m a one-trick pony? I did answer that.

    That question didn’t seem to be directed at me. It seemed to me you’d provisionally decided I was some sort of crank not worth talking to, and you chose to ask others whether you were right.

    That’s not a friendly conversation starter.

    But do you really think that the genome is a computer program, and that evolutionary biology is just a rediscovery of computer science? Or was that a very clever joke?

    Is that the question(s) you’re referring to? You didn’t ask me that, and if you had I’d happily have answered you.

    Instead you chose to speculate about other things, apparently amounting to (1) whether I don’t have much worthwhile to say in general, (2) am some sort of neurotic obsessive about this one pet subject—and implicitly, probably not worth talking to about that, either, and (3) am arrogant or insecure about my dick size or something, because I tagged a “dick-waving” joke that was already on the table.

    Is that how you normally start conversations—by (1) making several derogatory insinuations about somebody, (2) questioning whether they’re worth talking to at all, in an aside to others, (3) not actually engaging the subject at hand, even a little bit, and apparently avoid doing so, and then (4) faulting the person you’ve put on the spot in that way, for not answering questions that you never asked?

    Golly.

    Sorry, that’s not the kind of thing that makes me want to launch into a discussion of a weird and subtle topic about which I do think I have interesting but easily misunderstood things to say. It’s doesn’t make me confident that you’ll actually listen to what I’m saying, and think about it, rather than taking cheap shots to defend a point of view you’ve already settled on.

    If intended seriously, it’s the sort of thing that obsessives tend to say.

    And you’re still doing it.

    I’m not in the mood for that shit.

  25. Amphiox says

    Even if the genome is a computer, it is not accurate to say that evolutionary biology is a “rediscovery” of computer science.

    Computer science is about building computing systems de novo and figuring out all the various ways in which you can make them work.

    In evolutionary biology, you already have a computing system* sitting in front of you and you already know that it works. Your challenge is to figure out the ways in which it, specifically, works.

    The distinction is important.

    * minus the starting assumption, you also have to first figure out if it is, in fact, a computing system.

  26. Paul W., OM says

    Owlmirror,

    I don’t know if you meant to imply much with that quote, but that’s not the kind of “computing” I’m talking about. I’m not talking about “information processing” in the sense of very low-level information theory, with any little change that causes any little change counting as information being “processed” and “stored.”

    I’m talking about computation (some digital, some not) in a stronger sense, transforming “interesting” patterns according to “interesting” patterns in “interesting” ways, and mostly insulated from a lot of other changes in things.

    (What makes the nucleus of a eukaryote mostly a computer is partly that it keeps most stray chemicals that could affect gene transcription out. That allows genes to model relationships between propositions, etc., without random things mucking up the modeling much. Genes mostly switch other genes on and off, or turn them up or down, without irrelevant influences wrecking the intricately patterned switching and knob-turning.)

    I don’t want to try too hard to make “interesting” precise—shades of Behe—but it’s got to be sufficiently complex but patterned (not complex in the sense of incompressible random bits) that you can feasibly do the kinds of things you often do with computers—model nontrivial external things, keep tabs on nontrivial internal state, and interpret and adapt to both. The “language” or structuring principles of the machine should allow you to do fairly arbitrary transformations between inputs and intermediate values and outputs, so that you can implement abstractions that reflect the logic of whatever it is you’re computing about.

    In the case of the genome, it’s more than that, because it’s a programmed computer—there’s a recognizable program made of discrete instructions, distinct from the machinery that interprets those instructions, in a recognizable low-level “machine language.” (Which isn’t a serial, sequential-by-default language like for a Von Neumann machine, but is easily recognizable as a different, independently known type of computer language—a nondeterministic parallel forward-chaining “production system”.)

    The cell doesn’t just “process information” in a pedestrian sense. It clearly computes according to a program, and that matters a lot. You have a fairly general, flexible processor, and a program you can modify, and IMO that’s why evolution works so well, whether you use that CS terminology for it or not.

    There’s a low-level sense in which the rule language of genes is compact and direct. You can use the values of discrete variables (concentrations of transcription factors) to model discrete concepts—truth values of boolean propositions, degrees of truth of fuzzy propositions, degrees of evidence for boolean propositions, etc. And you can write simple logic-like rules about relations between propositions (e.g., if A and not B then C), and you can manipulate crude quantities. (E.g., “if lots of A and no more than a little B, then lots of C”, which depending on how the program is written might really mean that “if we’re pretty sure A is true, and have no good reason to think B is true, then strongly infer C,” or for fuzzy concepts it might mean “if A is very true and and B is no more than a little true, then C is very true.” And of course for actually quantitive concepts it might mean “if there’s a lot of A and no more than a little B, then there’s a lot of C.”

    The ability to fairly directly and concisely state relations among propositions and to process attached scalar quantities is simple, but very useful for modeling and controlling lots of things. As low-level languages go, it’s not a bad place to start in doing all sorts of things. It’s a low-level language weirdly similar to somewhat higher-level languages invented independently for modeling and control tasks, e.g., diagnostic “expert system” software, factory automation, and reactive mobile robots.

    The low-level “machine” language doesn’t commit to an interpretation of the quantities, i.e., whether they represent actual quantities, weights of evidence, or degrees of fuzzy truth, but that’s absolutely typical of machine languages. (Von Neumann computers have no concept of what a particular binary integer really means, either.)

  27. johnharshman says

    Well, you sure put me in my place. I was responding, first, to your stepping on my joke, which annoyed me, and pissing on evolutionary biology, which annoyed me further. Character flaw, I know. But second, I was responding to the very common trope that the genome is like a computer. If you’ve read much in biology blogs/usenet, you should know that this is most often brought up by computer programmers who are clueless about biology and are often IDiots. (Computers are designed, the genome is a computer, therefore the genome is designed. QED) Based on your reply to Owlmirror, that isn’t you. But it would have been a reasonable hypothesis based on your initial post.

    Also based on your reply to Owlmirror, I don’t immedately see the value in the analogy. What isn’t a computer? How does a computer differ from a non-computer?

  28. Paul W., OM says

    johnharshman,

    Oh fuck, I’m sorry.

    I now see that I did three things in quick succession, in one post, that could—not unreasonably—be interpreted as trying to pick a fight, and singling you out for no good reason.

    I really didn’t mean to do any of that.

    My post was meant to just tag on your joke, following one joke with another without meaning to say (in any non-jokey way anyhow) that there was anything at all wrong with your joke. It was fine and I was just following one gag with another. I wasn’t disagreeing with you or criticizing your joke, but given that my joke had a little edge to it—not aimed at you, at all—it could sure seem that way.

    And then I went and said expressed a related, controversial idea, that wasn’t particularly related to your joke, or aimed at you at all, but I didn’t make that a separate post, as I should have. I just said one thing and then another, all in what looked like a response to you. (And it may have looked even more like the whole post was aimed at you, because it starts with your name and a colon. That was only meant to be the header of the quote, for ease of reference, not and indicator of who the whole post was aimed at—it wasn’t all talking at you specifically, but I see now that it looks that way.)

    I was mystified why you seemed to be taking it upon yourself to cast doubt on my credibility and so on, for seemingly no good reason.

    It makes a lot more sense now—I fucking started it. I inadvertently singled you out and was weird to you, so I can see why you’d ask “who the fuck is this guy anyway?”

    Mea maxima culpa.

  29. Paul W., OM says

    johnharshman:

    As for the substantive content… the first thing to realize about what I’m saying is that I’m not trying to make an analogy, useful or otherwise. I don’t think that the genome is like a computer program, I think it literally is one, and that that should be uncontroversial.

    I’d be the first to admit that as an introductory “analogy,” it’s got its problems. Most people don’t know what is or isn’t a computer, and will naturally use whatever mental model of computers they have—much of which is likely to be specific to von Neumann machines, and much of it just wrong.

    On the other hand, I think it’s non-analogically true, and I tend to think that’s important. It may be worth explaining what a computer program is, and what it doesn’t have to be, so that people get the actual significance of the genome being a computer program.

    I’ll address your substantive questions (e.g., what is/isn’t a computer) in another comment.

  30. vaiyt says

    @38:

    Word.

    Computing systems are top-down. Someone comes in and makes up the rules, and those determine the behavior of the parts.

    The real world works bottom-up. The “rules” are human constructs that describe the interaction of the parts. The only way to figure out the rules is by investigating the real world.

  31. johnharshman says

    Paul W.

    Thanks for the detente. The internet makes us all look manic. I look forward to your explanation of the computer program-genome identity. I see I elided that to computer-genome, which is a bit of a different thing.

    While waiting for that, let me present one potential difficulty. Do you actually mean the genome, or do you mean the entire metabolism of which the genome is a part — genome, proteome, and the various other cofactors and signalling molecules that are involved?

  32. Owlmirror says

    Computing systems are top-down.

    Human computing systems are. But if “computation” is defined broadly enough, the universe is a computer.

    It’s just that all that it’s computing is the next iteration of reality.

  33. Paul W., OM says

    johnharshman:

    Do you actually mean the genome, or do you mean the entire metabolism of which the genome is a part — genome, proteome, and the various other cofactors and signalling molecules that are involved?

    Let me sketch the basic story, and I think you’ll see what I mean, or be able to ask a question in a form that is clearer to me…

    But in general I have two ways of dealing with stuff like that:

    1. much of the detail is in exactly how the machine is implemented, but doesn’t affect the machine language much. (For my story, and for most CS purposes, it doesn’t matter how genes are transcribed—e.g., are there special nanomachines that help fold proteins?—so long as they end up with the right specificity of action.)

    2. the machine language is doubtless more complicated than the very simple rules-and-propositions language I present here, but the language being more complicated wouldn’t make it not a programming language. It might make it an uglier or more powerful one, or a somewhat different one, or whatever, but the basic story holds. Big ugly programming langauges are no less programming languages than simple pretty ones, and you can make the main points with simple pretty ones.

    Holy cow, the following turned out to be long… and I don’t really have time to make it short, so here it is…

    Most genes are discrete instructions for a computer implemented by the cytoplasm (which acts as working memory holding variable values) and the gene transcription mechanism.

    A typical gene—a “non-structural” gene that just serves to switch other genes on or off, or turn other genes’ activity up or down—is an instruction for that computer.

    To use a string of CS buzzwords, the particular kind of computer that the basic gene activation and transcription mechanisms implement is parallel, asynchronous, forward-chaining production system.

    What that means is that the program is mostly just a set of rules that say what do in response to what.

    A typical (non-structural) gene is just an if-then rule (a “production”) like

    IF A and B and not C THEN C and D and E

    The program variables A, B, C… are implemented as concentrations of chemicals in the nucleoplasm. (More precisely, they’re represented by the relevant binding sites on the molecules of those chemicals.)

    This is not an IF-THEN statement like in a serial program on your PC. It doesn’t get executed because the previous statement was executed and it’s the next one, and it does not implement flow of control. (There are no GOTOs in this machine language. There is no sequential stream of control to tell where to go.)

    A rule in a production system is less like a statement in a serial program than it is like a an axiom in logic. The above rule (sorta) says that if A and B are true, and C is not, then the rule can “fire” and “produce” something that says C and D and E and put it in the working memory of the computer, to affect which other rules can “fire.”

    In biological terms, one rule is represented by one gene, and the preconditions on the “left hand side” of the rule are represented by promoter and repressor binding sites—places that molecules with appropriate sites can dock to enable or disable the gene from being transcribed.

    When the rule fires, i.e., the gene transcribed it produces a molecule (transcription factor) with binding sites that implement C and D and E.

    It’s very different from a serial, sequential von Neumann machine, but it’s a well-understood kind of computer. It’s certainly well-understood that if you implement one in hardware (as opposed to simulating it in software), it is definitely a computer. It’s way more than it needs to be to be just “a computer,” and is a pretty interesting, flexible kind of computer.

    To a lot of biologists who’ve done some programming, now, this basic kind of computer seems “not like a computer,” because it’s not like a von Neumann machine.

    But to computer scientists who know a little theory of computation and history of computing it’s exactly the opposite:

    This is an older and more fundamental kind of computer than the von Neumann machine, which only goes back to the 50s.

    It’s even older and more fundamental than Turing Machines, which go back to the 1930s.

    Production systems were developed as a theoretical model of mechanical computing by Emil Post in the 1920s, to explore things like what can possibly be computed automatically by a machine. (Simple forms of production systems probably predate Post, too, but he’s the guy famous for developing them thoroughly and showing why they’re very interesting.)

    To me, it’s pretty interesting and kind of funny that Post was developing a simple and basic theoretical model of computing, and had no idea that he was reinventing something that nature had invented a zillion years before, and was still doing, e.g., in every brain cell he used to think of it.

    That’s why I react badly to biologists saying things like “the genome isn’t like a program” and “gene transcription isn’t like computation.”

    That is the diametric opposite of the truth, because it just doesn’t get any more program-like and computer-like than being a production system actually implemented directly in hardware. Holy cow, that’s a computer if anything is.

    By comparison, Turing machines are butt-ugly and awkward ways of doing what production systems do—as Turing himself famously proved—and von Neumann machines are just ridiculously overspecified and stereotyped production systems that can execute one instruction at a time, in a very stereotyped order, really really fast, exploiting the bizarre physical quirks of impure silicon and human manufacturing technologies circa 2000 C.E.

    Production systems aren’t basically quirky—they’re simple and clean and beautiful and timeless, as pure computers, with one foot in the world of formal logic, and the other in the world of “hey, you can actually build that, if you want.”

    It’s Turing machines and von Neumann machines that are weird, in basic theoretical and mechanical terms, compared to production systems.

    OK, enough CS soapboxing for the moment… and back to the ways gene transcription, as computation, is a bit quirky too…

    One slightly weird thing about gene transcription as a production system is that it’s stochastic. Whether a given rule fires is subject to vagaries of molecules bouncing around and maybe running into binding sites, limited gene transcription machinery that may be busy transcribing another gene, and so on. There’s a big element of randomness at a short timescale, and a “rule” doesn’t generally have its effect by firing once, to produce one molecule.

    That’s okay. It doesn’t make it not a production system, and certianly doesn’t make it not a computer—it is absolutely a production system, and absolutely a computer, just one that has an ugly quirk that can make it somewhat tricky to program.

    But to a computer scientist familiar with the relevant areas of computer science, that’s not actually very weird. People have actually built production systems with rules that fire in randomized order on purpose—usually as theoretical constructs, but sometimes as real systems. We sometimes inject randomness for good scientific or engineering reasons.

    (For example, for many rule programs, there’s not supposed to be a strong dependence on exactly how often or in what order individual rules fire—if injecting randomness makes the program malfunction, there’s a bug in that needs to be found and fixed.)

    Looking at gene transcription as rule-firing at a low level of analysis, and a very short timescale, what you have is basically discrete instructions that fire stochastically in a binary way—if there are appropriate promoter molecules docked to the control region (and repressors not docked) the rule may fire once, or not.

    Of course, that’s not how the rules are typically used. You don’t really care whether a rule fires once—producing one molecule of a transcription factor is unlikely to put that molecule in the right place at the right time to have the right effect on the firing of another gene.

    So in the general case, a rule may fire in a noisy analog stochastic way—the (approximate) rate of firing is what matters, to produce an (approximate) concentration of transcription factor molecules, to have the right (approximate) effect on the firing rates of other rules, and indirectly on concentrations of other molecules, firing rates of other rules, and so on.

    That’s pretty ugly and weird looking to people who are used to purely digital programming on so-called “digital” computers, but it’s not un-computer like at all. Analog computers are computers too, and for a while there, most computers so-called were in fact analog or hybrid computers—components were just too expensive to do everything digitally, when a single analog component could do something directly that.

    And digital computers are analog computers. Digital computers are made of fundamentally stochastic and analog devices, plugged together used in clever ways to hide the randomness from people who work at higher levels of analysis.

    But that’s computing too, and it’s done with computer science principles. (Even if most of the people who work at the analog level are called “electrical engineers” rather than “computer scientists.”)

    There is nothing non-computational about analog computing, or about using analog computation to achieve digital effects.

    And that’s what the bulk of the genome, as a program, does.

    At a very low level of analysis and short timescale—individual rule firings—rules fire in a digital but stochastic way.

    At a very slightly higher level of analysis, they fire in a graded, noisy, analog way, and some rules and sets of rules must be analyzed in those terms to understand what they’re doing.

    But many rules aren’t best analyzed at that level, most of the time. They use standard computer sciency tricks to achieve binary digital effects, i.e., switching things “fully on” or “fully off” and avoiding the middle when it counts. A rule may in effect be binary in its operation, and a transcription factor may represent binary variable values, by always having clearly high or clearly low concentrations when it matters to other rules’ firing.

    There are only a few reasonably simple ways of making that kind of mix of analog and digital work—provably, IIRC—and evolution clearly discovered at least some of the very same basic tricks that humans have (re-)invented in developing analog, hybrid, and digital computers. It must have to get such things to work as well as they do, and put us here talking about them.

    So far, I’ve explained why the stochastic quirks of the basic gene transcription shtik doen’t seem all that quirky to some computer scientists—circuit designers who face the same low-level problems. It may seem like this isn’t very computery and is more of a weird specialized circuit design thing.

    But that’s not true. These sorts of issues are fundamental and recur at multiple levels in computer systems—including very high levels, in systems built by “computer scientists,” not “electrical engineers.”

    There are other computer scientists who face analogous problems all the time, though—and more of them—and they analyze things similarly and use similar hybrid techniques, even though all the computers they’re concerned with are “digital”.

    It turns out that when you have a bunch of processes on a bunch of computers interacting, you face a lot of the same issues in making subsystems with upredictable performance work together in more predictable and efficient ways at a larger scale and a higher level of analysis. In complicated computer systems, the speed of program execution and the speed of communication often vary in ways that programmers can’t predict well, but their programs must cope with and not flake out. (E.g. due to other programs that may be running on some of the same computers, or communicating across shared communication channels.)

    The problems that come up at this high level are often similar to those that come up at a very low level, in circuit design—things are stochastic, and you have to construct programs that work despite parallel subprograms and cooperating distributed programs running at weirdly varying rates, etc.

    And often things aren’t so much stochastic, i.e., really random, as just unpredictably very strongly patterned, which is lots worse—you may know how to deal with randomness just fine, but not all the strong patterns that might pop up.

    So you may intentionally inject randomness into the system at crucial points to destroy any really bad, un-thought-of patterns, and make it stochastic, so you can deal with it as if it was fundamentally stochastic, which is at least understandable and doable.

    (For example, when two computers try to communicate across a shared channel at the same time, and “collide” by clobbering each others’ transmissions, they may “back off” for a randomized delay time before retrying. The randomness ensures that they’re unlikely to back off for the same time, and just collide again when they both retry. In the short run, one program may get victimized several times in a row, at random, losing the race to get to “talk” over and over, but in the long run, it’s very unlikely that any given program will be victimized very much very often–the randomness spreads the hurt around very fairly with very high probability, due to the law of large numbers.)

    Because of this, stochastic models—specifically including stochastic forward-chaining production systems, oddly like basic gene transcription—are sometimes used in distribution systems, though mostly as an analytical model in the design phase.

    (Hunks of work to be done are modeled as rules to be fired in a production systems, and possible orderings of execution are analyzed for correctness, using a variety of techniques. Then the actual system is built in a more imperative way, in a conventional programming language, but with code that procedurally preserves the constraints on “rule firing” orders from the rule-based model. Weird, huh?)

    I could list and explain more things I’ve encountered that weirdly resemble the production system implemented by gene transcription, at various levels in various domains—in industrial process control, reactive mobile robots, etc.,
    but I won’t.

    I think you probably get the basic ideas I’m driving at, which are:

    1. The more I learn about genetics and evo-devo, the more things I find that are eerily familiar, even though of course there’s other stuff that just isn’t, and presumably plenty of both left to find out.

    2. A lot of this seemingly very quirky and specifically “biological” stuff about gene transcriptioin isn’t as quirky in light of computer science. You are encountering some stuff we’ve seen before—and even some of that we’ve seen over and over.

    3. Neither should be surprising if the genome is a computer program.

    4. Once you’ve seen the similarities to other computing systems, both real and theoretically, how on earth could you say it’s not performing computation. It looks like computation, walks like computation, and quacks like computation, and we computer scientists have a name for that sort of thing: computation.

    I realize that last bit isn’t a sound argument, but computation is just patterned information processing of patterned data.

    It doesn’t take much at all for a process to literally be some kind of computation, and gene transcription is clearly doing a lot of that, with outputs of very simple computations enabling other simple computations, in patterned ways that implement more complicated computations—computing functions with more inputs, more internal state variables, multiple stages, and so on.

    And it’s doing it in particular basic ways—e.g., nondeterministic rule firing, using analog to implement digital, etc.—that computer scientists have recognized as obviously computational since they were invented, decades and decades ago.

    BTW, I hope it’s clearer now what I meant about biologists rediscovering things that computer scientists reinvented decades ago. There are deep reasons why computer scientists had to discover some of those tricks to get “computers” to work, and evolution faced and solved the same problems hundreds of millions of years ago. (Billions, in some cases, I guess…?)

    It shouldn’t be surprising if computer scientists sometimes get there first.

    I did not mean to imply that computer scientists have done it all already, and biologists are doing nothing but reinventing things we already understand.

    I’m only saying that computer scientists do already understand some of the general kinds of things that genes do, in ways that at least some biologists might benefit from—or at least be interested in the fundamental similarites, rather than dismissing computers as “very different” and computer science as irrelevant.

  34. johnharshman says

    Well, that’s a lot to digest, and there’s quite a bit I’m not clear on after one reading. But I have a thought or two.

    First, there’s no need to spend so much time on stochastic effects, as I never thought that was an objection to the genome being a computer program.

    Second, what seems to me to be crucial is to define “computer program” or perhaps just “computation”. Or maybe “computer”. How is it defined so that the genome is a program, but most other things in the world (weather, plate tectonics, a lobster) are not? Keeping the CS jargon to a minimum will maximize my chance of knowing what you mean.

    Third, I think there’s some biological confusion here. The majority of genes are not transcription factors. Do you mean to count only developmental genes as part of the computer program? Most genes are not part of the transcription and translation machinery either. And it seems to me that to the extent development might be considered a computer program, much of the programming is not in the genome. What parts of the organism are the program, and what are the computer? Those, if I think of it, are parts of my central objection to calling the genome a computer program: the difficulty of separating program from computer and the interactions of all the other parts of the system.

    Fortunately, everyone else has long since left the area, so we can consider this at length. But not so much length, if you please.

  35. Paul W., OM says

    Vaiyt, Owlmirror:

    I rarely know quite what it really means if someone says that X “is top down.”

    Computing systems are not, in general, designed in a clearly or straightforwardly top-down way. They are really designed opportunistically, skipping up and down levels, and often middle-out.

    And they’re often designed laterally, trying to combine aspects of different designs and find a coherent way to connect, organize, and reconceptualize them in a more or less top-down way.

    Like human organizations, the actual important relationships in complicated computing systems typically aren’t correctly expressed in hierarchical diagrams, and often cannot possibly be.

    In complicated systems, there are often multiple competing senses in which something may be “higher” or “lower,” more or less valid in different ways. E.g., a reloadable, changeable device driver is above the operating system in some senses, but below even the OS kernel or microkernel in others. A similar loadable OS policy module, such as a process scheduler, may be very high and very low in different and sometimes opposed senses.

    That kind of thing shows up in subtle ways in lots of kinds of complicated software—often crucial ways that few people can conceptualize clearly.

    The actual run-time functioning of actual computing systems, at the hardware level, is generally very bottom-up. Designing a computer system “hierarchically” and building it is ultimately a matter of arranging for regularities to emerge bottom-up at run time, just as they must in biology.

    E.g., in biology the illusion of system/organ/tissue/cell hierarchy is broken by cancer cells, which start misbehaving and reproducing like a growing organ, and don’t kill themselves when they’re supposed to.

    In a computer similar bugs can cause havoc, e.g., a subroutine that doesn’t execute a correct procedure return when its supposed to let its caller resume execution, or a device driver that doesn’t yield control back to the operating system when it’s supposed to, or a hardware device that refuses to release a hardware resource needed to run operating system kernel. (E.g., the main memory bus of the computer.)

    In any of those cases, biological or “computer”-al, a subunit has simply stopped voluntarily behaving as if it were part of a hierarchy, and started doing its own thing. It can do that because the conceptual “heirarchy” is ultimately a matter of “gentlemen’s agreements” about how to behave.

    When something stops playing by the rules of the game, and starts playing a different game, the hierarchy is fucked—the bottom up emergence of the appearance of heirarchy stops happening and the illusion is broken; anarchy reigns.

    IMO this deep and basic similarity between biological systems and complicated computing systems is not just coincidental—biological systems are complicated computing systems. When a cancer cell goes astray, “refuses” to undergo apoptosis, and “chooses” instead to replicate, that’s clearly a computational error of the same fundamental sort as a procedure that won’t return, or a hardware device that won’t a crucial resource to the conceptually “superior” CPU.

  36. Owlmirror says

    Paul W.: Thanks for #46. I found it deeply interesting — is there a reading list you could post that has more related information?

    I was reminded by this post by PZ from 2008.

    I have also been reminded that Richard Feynman wrote about DNA as a computer (among many other things acting as a computer), in Feynman Lectures on Computation

    ======

    I understood “top down” to mean simply that computers made by humans are the end result of manufacturing processes designed by humans, resulting from the reification of human abstract thinking (over multiple iterations).

    Things made by humans, rather than things (or systems or processes) that grow or develop or evolve on their own.

  37. johnharshman says

    I was reminded by this post by PZ from 2008.

    The link is currently not working. Could you tell me what it reminded you of and what the post was about?

  38. Paul W., OM says

    Owlmirror,

    Unfortunately, I don’t have a reading list for you; I wish I did.

    As for PZ’s post, I remember that one, and I liked most of it a lot. (Unlike his “The genome is not a computer program” post from around that time, where I like what he was saying about biology but thought he was completely straw-manning computers to create a false contrast—inadvertently, I’m sure.)

    One of the ironies there is that his points about casinos and extreme jitteriness reminded me very strongly of some parallel and distributed programming systems, which are utterly baffling on any local analysis. (The example I gave earlier of collision resolution in simple LAN’s is just the tip of the iceberg.)

  39. johnharshman says

    So, has this thread died before I could get any sort of response? Is anyone there? Sometimes the intertubes move too quickly for anything interesting to happen.

  40. Paul W., OM says

    I’ve been checking back, but it looks like Owlmirror is gone.

    I thought your question was for Owlmirror, but I could elaborate on why that post reminded me of eerily similar CS stuff, if you want.

    It’s about some key features of biology that PZ is prone to saying are “not like a computer program” because he doesn’t know about programs that have those very features, often for the same deep reasons biology must.

  41. johnharshman says

    Paul W.:

    Owlmirror replied to me. But I was trying to get a reply from you to my comment #47. Perhaps you missed it?

  42. Paul W., OM says

    Ah, yes, I lost track. (I have actually drafted several things that I haven’t posted here, and I lost track of what I had and hadn’t.) I’ll try to edit up an answer to #47 soon.

  43. Paul W., OM says

    johnharshman,

    As has been pointed out there’s a broad sense in which any physical system can be said to compute.

    E.g. if I have a glass vase of a particular size, shape, and consistency, and hit it with a very precise series of blows from a ball-peen hammer, the resulting “output”—a pattern of glass shards on the floor, say, is a complex function of the inputs. I might be able to model some other system that way, and get some useful information out of that “computer,” but I can’t think of what. The pattern of shards is very idiosyncratic and unlikely to be useful for modeling many other things.

    Computer scientists usually mean something narrower, namely that

    1. some physical aspects of some physical system (the computer) can relatively easily be mapped onto easily and be mapped onto other physical or abstract systems, and used to model them (at least in principle), and

    2. the physical system is structured in such a way that you could easily reconfigure (or program) it to model very different functions in terms of the same basic model parts (at least in principle).

    (Those “at least in principle”s will turn out to be important for getting the definitions right and reflecting what computer scientists actually do and don’t call “computers,” and why.)

    Lots of things are or can be computing devices, which compute functions, and which you might simple computers, but we generally don’t. (They do compute, so they are obviously actually computers in much the same way simple machines like levers and inclined planes are trivial “machines,” but we normally use the word “machine” in a different sense, to refer to more complex assemblages.)

    One example of a fairly simple computing device or trivial “computer” is a simple thermostat with one moving part—a laminated strip, partly metal, which bends more at higher temperatures, because the laminated materials expand at different rates with increasing temperature. (The degree of curvature is thus a model of the temperature.)

    Another example would be a pair of graduate cylinders used as an analog adding device. You can repeatedly measure out columns of water with one cylinder, using height to represent a number, and pour the water into the other cylinder, which adds them up—the total height represents the sum of the added numbers.

    It’s important to notice that almost any convenient properties of any convenient materials can be used to compute, i.e., to model and/or control other things in patterned ways.

    (That’s relevant to your concerns about IDiots who think computers are intrinsically complicated in ways that must be intelligently designed. Computer are like other machines—they’re either very simple, or made out of simpler computers. Evolution is good at doing simple useful things, and combining them, so at least some computers can evolve incrementally in small steps

    You asked what’s not a computer, e.g., maybe continental drift or a lobster.

    I said above that a “computer” is a system that can easily be mapped onto other systems, and used to model them, at least in principle. It can also be easily reconfigured

    Continental drift makes a lousy computer, because it’s not clear what else it could be used to model or control, even in principle—e.g., even if you had the godlike ability to reconfigure it by moving continents around, changing the rate of the earth’s rotation, etc.

    The lobster is a weird example, because it does contain a special purpose distributed computing system, made up of a bunch of comparatively general purpose computing syste, inside cells.

    The lobster as a whole is a robot—a mobile machine with a fancy computing system inside, made of relatively general purpose, flexible-in-principle computers, inside cells.

    With some conceptually minor tweaks, you probably could in principle use a lobster’s distributed computing system to compute some other interesting, complicated and useful functions, e.g., by adding some instructions (genes) that compute relevant functions, and setting them up to communicate and cooperate by using new intercellular signaling molecules. You could probably use the resulting lobster as a fairly general-purpose computer for modeling all sorts of things, as well as for what it mainly computes, namely how to be a lobster. Input-output would be a bit tricky in practice, but theoretically it should work. (And that’s without highjacking its nervous system, which is a distributed computing system too.)

    So a lobster does comprise a fancy computer—a distributed computer made of lots of simpler fairly general computers—as well as a whole lot of other machinery to move the whole shebang around, feed it, pump blood, and so on. (All that machinery is at least mostly controlled directly or indirectly by the distributed computing system I’m talking about.)

    But that probably wouldn’t be a cost-effective way to build (or co-opt) a computer. If you were good enough at hijacking the lobster computer and using it for general computation, you can probably find an easier, faster way to compute, and build it from scratch.

    Let me give an example of a much simpler computer—the analog computer in the Norden bombsight—and use it to illustrate those “in principles.”

    The Norden computer was made up of a modest number of resistors, capacitors, and so on, and was used to model a very specific kind of situation—a bomber flying at high altitude, and a bomb falling from it to the ground. Various factors were modeled by things like voltages on wires, or accumulating charge on a capacitor, such as altitude, airspeed, groundspeed, wind direction, bomb air resistance profiles, and so on. By using capacitors to compute integrals (accumulating charge given a varying input voltage is like computing area under a curve) and other functions, the whole situation could be modeled and used to tell when to release a bomb so that it would hit a given point on the he ground.

    We can look at that, and say that’s a computer, because

    1. we can see how it computes larger functions in terms of simpler ones, and how that models complex situations outside the computer in terms of their simpler components or aspects, and

    2. we can see how it could use the same pieces, combined in slightly and straighforwardly different ways, to model very different situations using the same basic “vocabulary”.

    Resistors and capacitors and such can be used to model lots of things that you model with summation, differentiation, integration, and so on, and those things can be used to model all kinds of stuff we use math for. (Or we can ignore the mathematical abstraction and say that the volatages and charges directly model aspects of the domain of application.)

    The Norden computer was hardwired to compute a fixed function of a few inputs—the wires were not actually easily movable—but the flexibility of the design matters to recognizing it as clearly a computer worthy of the name.

    That matters to seeing how biological systems compute, too. It may not be easy to actually “rewire” genetic regulatory networks, but it’s conceptually simple—you just change the DNA sequences for some proteins so that they fold up into the right shapes, in certain places, to match the binding sites you want to “connect” them to as a signaling molecule.

    The fact that such rewiring is difficult in practice, given the current state of the art, doesn’t matter—we can understand how genes are “wired” to each other, and see that it’s an elegant modeling trick that we could use to model or control lots of different things, just by changing which genes are “wired” to which other genes.

    That’s plenty for genetic regulatory networks to clearly be computers—analog computers much like the Norden computer.

    But the gene-processing mechanisms of the cell are something more interesting—they’re a programmable computer, and the operations of the genes are the execution of the program encoded in the genes.

    Notice that I said a computer is generally reconfigurable in principle.

    A program is just an easy and general way to reconfigure a computer. Rather than moving patch cables around to change the actual wiring, or using front-panel switches to get the same effect (connecting the other components differently), you use some memory somewhere inside the computer to hold information about how to reconfigure the computer automatically.

    That’s what DNA is for. They are text-like strings of units that specify instructions, and automatic operations on those instructions “rewire the computer” to perform different operations, or to use outputs of as inputs to different operations at different times.

    The genome is a program for a computer that is not reprogrammable in the usual sense—in a given physical computer, the program is essentially fixed.

    (That’s not unlike computers—a lot of “embedded” computers, e.g., in various consumer appliances, etc.—have their programs fixed in hardware. You can’t reprogram the individual toaster or whatever, but the designers can reprogram the design and make whole new toasters with the modified.)

    Still, given that you can see in principle how to reprogram the gene computer as a production system—making new sets of genes that correspond to new genetic regulatory networks, etc.—it’s clear that it’s basically a programmed computer.

    We can actually bypass the usual redesign (evolution) and functioning of the system and re-program it in awkward ways, using genetic engineering, but the really important thing to notice is that the ways genes are used to implement genetic regulatory networks is a kind of programmed computer operation.

  44. johnharshman says

    Might I suggest that brevity is the soul of intelligibility? You say too much in one chunk to comprehend, and there are too many side branches.

    We can agree that the notion that everything computes and is thus a computer is useless and can be dispensed with. Nor is it very useful to consider a definition that encompasses thermometers, since too many things are thus trivially computers.

    Let’s try for clarity. If a computer is a system that can in principle be used to model other systems, is it a computer when it isn’t being used for such a purpose, and perhaps has never been so used?

    And could you clarify this too?: In an organism, what is the computer, exactly? What is it being used to model? And what is the computer program, exactly?

    Briefly, if you can.

  45. Paul W., OM says

    John,

    It’s difficult for me to write briefly when I don’t know my audience’s background, don’t get much clear feedback about how they’ve understood or misunderstood what I’ve said so far, etc.

    I tend to write something with enough detail to clarify all the things I think are likely to be misunderstood.

    Besides, questions like “what is a computer” may seem simple, but the answer can’t be both simple and correct. It’s actually kind of weird.

    If a computer is a system that can in principle be used to model other systems, is it a computer when it isn’t being used for such a purpose, and perhaps has never been so used?

    Yes. Being a computer is about what something can do, and in fuzzy cases about what it could do if it was slightly different. The computer’s actual provenance doesn’t matter a bit—whether it’s intelligently designed, or evolved, or just assembled by a bizarre coincidence.

    And could you clarify this too?: In an organism, what is the computer, exactly?

    Each cell contains a computer, which is the machinery used to transcribe genes, use products of gene transcription to influence other genes (and other things). All of the mechanisms of basic gene transcription are part of the computer, including the plasm that holds the the genes and transcription machinery, and the transcription factors in solution.

    (The plasm holds other stuff doing other stuff, but one of its major functions is to be the “working memory” of the cell’s computer, holding the “variable values” in the form of concentrations of chemicals.)

    Those little computers inside cells may communicate with their neighbors, and with other cells elsewhere, using chemicals that pass through the cell membrane to the next cell, and in some cases into the bloodstream to be picked up by a cell far away in the body.

    What is it being used to model?

    Various aspects of the status of the cell and the processes of operating it—e.g.,

    1. which non-computational chemicals are out of bounds, e.g., those involved in metabolism, and what to do about it. (E.g., do we have enough ATP, and/or too much CO2?

    2. which mechanical things are broken and need to be fixed (e.g., do we detect a chemical that belongs in the cytoplasm but not the nucleoplasm, suggesting that the nuclear membrane is ruptured somewhere?)

    3. which stage we’re in of various cell-level biological processes—e.g., growth, differentiation, division.

    4. which stage we’re in of various subcellular biological processes, e.g., making nanomachines internal to the cell.

    I’m not a biologist, and generally can’t tell you which processes are monitored or controlled directly by the genes, vs. by mechanical or chemical mechanisms built and/or controlled by the genes, but you can be sure all of these kinds of things are going on. It’s like running and dynamically rebuilding a complicated factory—there’s a lot to keep track of and respond to.

    And what is the computer program, exactly?

    The genes are the program. Each gene is one instruction in a production system, and transcription of the gene is the execution of that instruction.

    Some instructions do input or output—e.g., respond to chemicals in the cell that are not produced by other genes (input) or produce an RNA or (indirectly) a protein that is part of a self-assembling nanomachine, or upregulates or downregulates some autocatalytic reaction, or whatever.

    Many instructions (genes) do neither, and perform whatever computation needs to be performed to process inputs into outputs. They are purely computational, serving only to influence the activity of other genes.

  46. Owlmirror says

    Continental drift makes a lousy computer, because it’s not clear what else it could be used to model or control, even in principle—e.g., even if you had the godlike ability to reconfigure it by moving continents around, changing the rate of the earth’s rotation, etc.

    Hm.

    Could plate tectonics be said to be a computer that models convection currents in the mantle?

    (NB: “Continental drift” is an obsolete term, as I understand).

  47. Paul W., OM says

    Could plate tectonics be said to be a computer that models convection currents in the mantle?

    I don’t think so, at least not in any nontrivial sense, to any nontrivial degree.

    There are at least two senses of “modeling” that are relevant.

    One is whether one thing is simply responsive to something else, e.g., when the curvature of a laminated strip thermostat models the ambient temperature.

    The laminated strip by itself is really just a temperature sensor. You need something else to make it compute the thermostat function, e.g. to put a metal contact in the right place that the strip will touch it when it bends far enough, and complete a circuit.

    That’s a still-very-simple but qualitatively much more interesting kind of model. When we place the contact where we want it, we’re modeling something else about the world—the temperature we care about, e.g., the temperature at which I’m on the verge of feeling disagreeably warm, so the air conditioner should come on.

    When I install and adjust the thermostat, I’m setting up a simple computer model of certain aspects of a certain physical system: my house, with me in it. I’m only modeling a couple of simple aspects of the system, namely the temperature inside the house, and my threshold for being too hot.

    The thermostat uses those two pieces of input—one a varying sensor input and the other a fixed data input—to model a third aspect of the system: am I on the verge of being too hot right now?

    That’s more interesting as computation because the model automatically figures something out without being directly told; it’s given the first two facts and uses them to infer the third.

    For plate tectonics to “model” aspects of mantle convection in that interesting sense, you’d have to find something in the mantle system that the plate system isn’t “directly told” by the mantle system, but it manages to “infer” about the mantle system from what it is told.

    Assume that outer mantle flows (and forces on plates) are a function of things like the temperature of the Earth’s core, the viscosity and thermal conductivity of the mantle material, and so on.

    If some process in the plate system took the given information and combined it to infer any of those things, that could be an instance of modeling in the more interesting sense, e.g., if you found that one continental plate rotates clockwise when the earth’s core heats up, while another subducts especially fast when the mantle conductivity is high.

    For such “inference” to count, it’d have to be doing something nontrivial, really combining information according to logical or arithmetic relations.

    One thing that wouldn’t count would be if all these things held the same information—e.g., that as the earth’s core heats up, the mantle viscosity always goes down and its conductivity always goes up, and the force on the plates always goes up, and a certain plate always subducts faster.

    We could say that the rate of subduction of that plate “models” all those other things about the mantle—heat, conductivity, viscosity, rate of convection, etc.—in a trivial sense, but what we have is just a sensor input that is correlated with all of those things, but specifically about none of them in particular. It doesn’t have a special relation to any particular thing going on down there, and doesn’t extract any information —we have to do most of the work by interpreting it as being about heat, or about viscosity, or whatever.

    In general, the simpler a system is, the more things it’s analogous to, and can be said to model. But if it’s it’s too simple, it’s too easy, and it’s not interesting to say that one thing is “a model of” the other.

    For example, any unchanging aspect of any system can be said to (trivially) model any unchanging aspect of any other system. (I could say that the percentage change in the number of moons orbiting Earth is a pretty good model of the human birth rate in the Marianas Trench, because they’re both consistently zero.)

    Saying that some system models something other system gets a whole lot more interesting as those systems get more complicated. Each additional thing or relationship between things that you add to one system rules out many more systems it might be said to be a model of—any system that doesn’t have a corresponding entity or relationship doesn’t count. (Under whatever correspondence you use to relate the two systems.)

    The general idea is that the more detailed your description of a system is, the fewer actual systems will count as meeting that description. Trivial systems have lots of parallels in other systems, but nontrivial generally ones don’t, and complicated ones have very, very few unless something funny is going on. (Like design or evolution putting one system into some useful correspondence with another.)

    So far as I know, there aren’t any nontrivial relationships in the plate system that model any of the nontrivial inner workings of the mantle system, and there’s no good reason to expect much of that.

    I wouldn’t entirely rule it out, though, because wherever you have feedbacks and dynamic stability, you have a decent chance of some kind of emergent complexity, where things get locked into resonance with each other, or something like that. You might find some nontrivial modeling of things in the mantle by things in plate tectonics. (Or just oddball patterned interactions that aren’t easily construed as one thing modeling another.)

  48. johnharshman says

    Paul:

    “I didn’t have time to make it shorter” is attributed to many different writers, but I still appreciate the sentiment. Consider taking more time. It really does improve clarity. (As a guide, you can assume I know at least as much about biology as you do, but much less about computer science.)

    So the computer is

    …the machinery used to transcribe genes, use products of gene transcription to influence other genes (and other things). All of the mechanisms of basic gene transcription are part of the computer, including the plasm that holds the the genes and transcription machinery, and the transcription factors in solution.

    This seems an odd and arbitrary circumscription to me. The transcription machinery would include various proteins, most especially the RNA polymerase complex. “Use products of gene transcription” is much too vague. What machinery uses products of gene transcription? The products of gene transcription are RNAs. Do RNAs influence genes? Perhaps; there are RNAs whose functions aren’t known. But I don’t know of any such. Then you mention transcription factors, which are not products of transcription. You may be conflating transcription with translation, and intend to include the translation machinery too. Based on this definition, including what it doesn’t say, I would imagine the computer to extend to the entire cell, including all its membranes, proteins, and various other molecules. It isn’t clear.

    And the computer program is

    The genes are the program. Each gene is one instruction in a production system, and transcription of the gene is the execution of that instruction.

    I don’t see how you make this equation. Why is a gene one instruction rather than part of an instruction or multiple instructions? Why does transcription count as execution? It seems a quite vague metaphor. Still, I think you may have achieved clarity on one thing: the genome is the computer program, if you mean “genome” by “the genes”.

    It’s being used to model

    Various aspects of the status of the cell and the processes of operating it

    —which you proceed to exemplify. I may be unclear on what “model” means. Apparently any sort of feedback or stimulus and response is a model?

    I’m not a biologist, and generally can’t tell you which processes are monitored or controlled directly by the genes, vs. by mechanical or chemical mechanisms built and/or controlled by the genes, but you can be sure all of these kinds of things are going on. It’s like running and dynamically rebuilding a complicated factory—there’s a lot to keep track of and respond to.

    That all depends on what you mean by “monitored” or controlled. The genome does only three things: it serves as a template for RNA transcripts, it serves as a template for its own replication, and it contains binding sites for various other molecules, the latter often influencing the rate of transcription. The genome controls nothing directly, unless you count a binding site as “controlling” binding. Some of the RNA transcripts do things themselves, some are translated into proteins that do things, and some are junk. But are these part of the program or part of the computer?

    I don’t really see a way to separate the genome (“program”) from the rest of the cell (“computer”), and it seems that you don’t either. Is that a problem?

    To take another tack, if this cell=computer and genome=program thing is more than a vague analogy, we should be able to make some use of it. You should, for example, be able to predict some feature of the system or its behavior that we don’t already know about. Can you?

  49. Paul W., OM says

    johnharshman:

    Sorry, I simply can’t “take more time” than I’m already putting into this.

    If this level of effort isn’t good enough, we’ll just have to stop.

    Each cell contains a computer, which is the machinery used to transcribe genes, use products of gene transcription to influence other genes (and other things). All of the mechanisms of basic gene transcription are part of the computer, including the plasm that holds the the genes and transcription machinery, and the transcription factors in solution.

    This seems an odd and arbitrary circumscription to me. The transcription machinery would include various proteins, most especially the RNA polymerase complex. “Use products of gene transcription” is much too vague. What machinery uses products of gene transcription?

    Sorry, I don’t understand what you’re not understanding here, and you might have mis-parsed something. (Which is why I restored the first part of the first sentence, above.)

    I am basically using the same abstractions as molecular biologists when they describe “genetic regulatory networks.”

    When you talk about gene interactions, and make a network diagram, you generally ignore most of the fiddly little machinery that actually makes it go. It doesn’t matter that the coding part of the gene is transcribed to RNA first, and then that’s further transcribed to a protein, and the protein is folded, and the protein bounces around in the plasm until maybe it docks to a binding site. All that matters is that somehow, something is produced that will promote or repress the firing of other genes, and it gets where it needs to be in order to do that, with a certain approximate probability. Typically that information-carrying product is a molecule of a protein, with appropriate-shaped binding sites to dock to the right part of the control regions of other genes, but none of that detail matters at the level of genetic regulatory networks. All that matters is that whatever is produced by the activity of one gene, by whatever means, has the right effects on the activity of other genes.

    When I say that the computer consists of all the miscellaneous machinery to do all that, that is not “oddly circumscribed” at all. It’s exactly how the abstraction of a genetic regulatory network is an abstraction. It’s precisely all the stuff that makes it go, that you ignore when you talk about GRNs.

    And I’m going one level down, and talking about individual gene actions, which affect other genes’ activity probabilistically, while still ignoring all the fiddly stuff that makes it go.

    It doesn’t seem to me that I’m saying anything controversial there, in biological terms. Biologists talk that way about gene activity all the time, when they’re interested in that level of description and higher-level patterns of gene network operation.

    What most biologists don’t realize is that that’s what computer scientists do, too, when talking about instructions and programs. When we talk about “an instruction,” we ignore all the fiddly machinery in the computer that actually makes it go, and speak in terms of inputs to the instruction, outputs from the instruction, and under what circumstances the instruction will execute.

    We usually ignore all the fiddly mechanisms that make it work because we can, and sometimes because we have to—they differ between machines that have the same instruction set.

    And we have a name for all those fiddly little mechanisms actually that make instructions go, which we usually ignore when we talk about an instruction “executing” at the machine code level of abstraction or any higher level of abstraction.

    We call that “oddly circumscribed” collection of instructions the computer. It’s whatever does the computing.

    That is all it means to be a computer executing a program—it is exactly a collection of fiddly mechanisms that make instructions go, and it can be any collection of any kinds of mechanisms that can make the instructions go.

    The complicated mechanisms of gene transcription do not matter at the level of GRNs.

    That’s just like how the complicated mechanisms in a modern human-made digital computer don’t matter when you’re thinking about programs at the level of machine code—you can ignore all the thousands or millions of transistors intricately arranged to actually make an instruction go (instruction fetching, instruction decoding, internal bus switching, register reading, functional unit operation scheduling, more internal bus switching, and register writing, plus mechanisms that bypass a lot of that a lot of the time, and do something else that’s equivalent but faster.)

    As long as it works to determine (1) which instructions execute when, (2) what data values they produce, and (3) how those values affect the operation of other instructions, it’s all good.

    When molecular biologists abstract away from RNA intermediates and protein folding and all the associated machinery, and just talk about genes and promoters and repressors and so on, they’re doing the same thing computer scientists do when they talk about instructions executing, without talking about all that stuff about fetching and decoding and internal buses and registers and so on.

    And I don’t think it’s just a convenient way of talking. It’s scientifically important.

    The fact that biologists can find pretty good abstractions of things like genetic regulatory networks suggests that evolution effectively divided things up that way fairly cleanly, because it works.

    And I think it works for reasons deeply related to why human computer designers do that too.

    Then you mention transcription factors, which are not products of transcription.

    They are downstream products of transcription, right? In terms of GRN’s, what matters (to gene regulation) when a gene is transcribed is whatever molecule ultimately ends up in the plasm to control which other genes are transcribed later. That is is typically a folded protein with binding sites that let it dock to sites in the control region of other genes to enhance or reduce their chances of being transcribed, right?

    And that molecule—the final product that controls other genes—is called a transcription factor, right? (Or have I misread Wikipedia?)

    You may be conflating transcription with translation, and intend to include the translation machinery too.

    Uh, whut?

    Probably, though not exactly “conflating.”

    Are you saying that the term “transcription” only covers the transcription of a DNA sequence into a corresponding RNA sequence?
    Is “translation” the transcription of that RNA sequence into a protein sequence, or what? (If so, does it include folding the protein?)

    Whichever way that goes, I’m using the term “transcription”, perhaps loosely, to include all of that—all the stuff that “makes a gene go” and results in an end product that can affect whether other genes go.

    And I could swear I’ve read biologists talking the same way, maybe speaking loosely and ignoring those post-processing steps after the transcription to RNA, presumably for the same reasons I do—they usually don’t matter at the level of which genes turn which other genes on or off. As long as they get the job done and the right product is produced, you can ignore them—just as you usually ignore the internal bus-switching and register updating implicit in executing an instruction on your PC—it’s just all that that fiddly shit that has to happen to get the information produced by one instruction wherever it needs to go, to affect the operation of subsequent instructions in the right way.

    Just as we don’t want to think about all that fiddly teeny hardware crap when programming a PC, biologists don’t want to think about translation into proteins, protein folding, and diffusion through the plasm when sorting out which genes produce which products that affect which other genes. Exactly how those products are physically produced by various machinery doesn’t matter for most purposes, so biologists too ignore the intermediate steps when working at the level of GRN’s (or just slightly below that, which is what I’m mostly talking about in terms of basic instruction execution).

    Based on this definition, including what it doesn’t say, I would imagine the computer to extend to the entire cell, including all its membranes, proteins, and various other molecules. It isn’t clear.

    Should it be clear, i.e., simple, in the way you seem to want?

    Consider a desktop computer, with a central processing unit, memory banks, disk drives, a power supply, a case, a monitor with a swivel stand, a printer, etc., and stuff to connect all those things.

    Is that all “a computer,” or is the computer just the CPU which, you know, does the actual computing? Or maybe the CPU and RAM, which holds instructions and intermediate products, because intermediate results of computing is part of computing. And the disk drives, because they do that too.

    But what about the power supply? The computer isn’t going to compute without that, so it’s part of the computer in that sense, but it’s not part of the computer the same sense—it doesn’t store or process information for the computations we’re talking about. (Though it may have its own little computer inside to control how it regulates power, etc.)

    It shouldn’t be surprising that the computer in a cell is the same way—you can draw the boundary of the computer narrowly or broadly.

    The computer in the cell isn’t going to do any computing without ATP to drive the machinery, so the Krebs cycle is part of the computer broadly speaking, but not narrowly speaking. It doesn’t provide, process, or store the information we’re talking about. Some genes presumably exert some control over the Krebs cycle, but that’s not the same thing. (Your desktop’s CPU may exert control over a power supply, too, e.g., by sending signals to its internal computer, but there’s an important sense in which that’s a different computer, doing a very different job.)

    And the computer program is

    The genes are the program. Each gene is one instruction in a production system, and transcription of the gene is the execution of that instruction.

    I don’t see how you make this equation. Why is a gene one instruction rather than part of an instruction or multiple instructions?

    For the very same reasons we call instructions in your desktop computer instructions:

    1. They do fairly simple computation, taking information-bearing inputs and producing information-bearing outputs related to those inputs in fairly simple information-processing ways.

    2. They generally execute in an all-or-nothing fashion; you either do it or you don’t. (Minor exceptions aren’t a problem.)

    3. They can execute in different orders, depending on how you connect them up.

    4. They can be used to perform larger and more complex computations, with the output of one instruction serving as an “intermediate value” useful only as the input to another instruction.

    5. They have a discrete representation, at some level of abstraction, which makes it possible, at least in principle, to revise the program by adding and/or “editing” instructions. You can connect or reconnect them by changing DNA sequences that result in binding site shapes (after transcription to RNA, translation to protein, and folding).

    6. The processor can somehow find the instructions and operate on them and only them as instructions, ignoring other physical objects and patterns are not instructions. The execution of instructions and the information processing it implements is thus mostly insulated and autonomous from various other physical processes going on the system. That autonomy doesn’t have to be perfect, especially in a stochastic computer, but it has to be pretty good.

    (A subtlety there is that in a von Neumann computer, programs are stored in the same memory as data, and you can actually manipulate programs as data, then turn around and execute them as code. Still, the processor doesn’t normally stumble into data and start executing it as code. The necessary autonomy of code from everything else is provided by constraining the structure of code to only step or jump to other code, usually prewritten as code, but sometimes generated on the fly as data.)

    7. Information being processed is likewise insulated and autonomous from other aspects of the physical state of the machine. E.g., most chemicals that could act like transcription factors are simply kept out of the plasm used to store transcription factors, and other molecules in the same plasm serving non-informational functions don’t dock to binding sites.

    (There are similar issues in PC’s, e.g., keeping various stray electrical currents from the power supply or other computational units from leaking into information-processing components and affecting voltages enough to count as flipping a bit. And of course, things like having a case with rubber feet for the computer, to avoid electrical shorts and physical shocks that might damage or “reconfigure” the computing components. In the gene computer, much of the isolation is done by selectivity of shape-matching, especially at binding sites, but presumably also in lots of other ways—e.g. with ribosomes effectively ignoring a lot of other crap floating around the plasm, mitochondria having membranes to keep their internal goo and nanomachines to themselves, etc.)

    Why does transcription count as execution?

    See above.

    (“Transcription” as I’m loosely/broadly using it—everything involved in activating a single gene and producing a product that affects the activity of other genes—counts as instruction execution for basically the same reasons instructions count as instructins—the definitions of “instruction” and “instruction execution” are necessarily interdependent.)

    It seems a quite vague metaphor.

    It’s just not a metaphor at all.

    Seriously, it’s not, unless calling anything a “computer” is a metaphor. (And some people would say that, and I can see why, but they’re wrong.)

    It’s not a particularly vague non-metaphor, either, as I hope you can see from the above list of 7 points about instructions. Gene processing has all the necessary features of instruction execution, and that’s all it means to be instruction execution.

    Likewise, the cell contains all the machinery to do that, and any such assemblage of machinery that can do that is precisely and literally a computer.

    I hope that’s getting clearer, but maybe I should ask this:

    What is it that you think is missing? How on is that not a computer? What do you think a computer is, that this isn’t, and why does that matter?

    And conversely, consider a gene that only responds to other genes, and only controls other genes. It does do input, and it doesn’t do output.

    If it’s not performing a little computation what is it doing? What would you call that kind of thing? How is that different from a computation?

  50. ChasCPeterson says

    From a disinterested passerby, for the record:

    the term “transcription” only covers the transcription of a DNA sequence into a corresponding RNA sequence…
    Is “translation” the transcription of that RNA sequence into a protein sequence…?

    This is indeed the conventional termionology, the preferred nomenclature. (It’s actually a nice metaphor, with nucleic-acid language being ‘transcribed’ into another form of nucleic-acid language, and then the nucleic-acid message being ‘translated’ into protein language. Metaphorical ‘languages’ stipulated. Different programming languages, to you?

    (If so, does it include folding the protein?)

    no.

  51. Paul W., OM says

    oops, where it says “It seems a quite vague metaphor” should have been quoted, and the subsequent stuff is my response.

  52. Paul W., OM says

    Thanks, Chas.

    Is there a single term for the whole shebang, from transcription through translation to folding and the resultant protein?

    If so, I’d be happy to use it.

  53. johnharshman says

    Paul,

    The word you are looking for is “expression”. A protein is expressed if it is produced, which of course happens through transcription, sometimes post-transcriptional processing, subsequent translation and folding, and sometimes transport and post-processing. “Expression” takes care of all the fiddly bits you want to talk about. Of course there’s a lot going on that doesn’t involve proteins, just untranslated RNAs, so you might want to think about that too.

    I’m willing to accept that you are incapable of brevity. Do your best, but be aware that you aren’t communicating in the optimum way. I would appreciate, however, if you would try to answer all the questions I ask; this too would aid communication. You left out a response to perhaps the most important ones, which I will repost below for convenience. I’ll respond to the rest later.

    I’m not a biologist, and generally can’t tell you which processes are monitored or controlled directly by the genes, vs. by mechanical or chemical mechanisms built and/or controlled by the genes, but you can be sure all of these kinds of things are going on.

    That all depends on what you mean by “monitored” or controlled. The genome does only three things: it serves as a template for RNA transcripts, it serves as a template for its own replication, and it contains binding sites for various other molecules, the latter often influencing the rate of transcription. The genome controls nothing directly, unless you count a binding site as “controlling” binding. Some of the RNA transcripts do things themselves, some are translated into proteins that do things, and some are junk. But are these part of the program or part of the computer?

    I don’t really see a way to separate the genome (“program”) from the rest of the cell (“computer”), and it seems that you don’t either. Is that a problem?

    To take another tack, if this cell=computer and genome=program thing is more than a vague analogy, we should be able to make some use of it. You should, for example, be able to predict some feature of the system or its behavior that we don’t already know about. Can you?

  54. Paul W., OM says

    johnharsman:

    I didn’t leave out the answers to those questions, I just hadn’t gotten to them yet.

    And actually, I started answers to those questions, but they started to get too long, and I cut them.

    Then I had leave, so I posted what I had, figuring I’d get to those later.

    Which puts me in a quandary. I do not know how to answer the question of what it means to model something briefly. The word has several senses, and at least two are very relevant to the biology we’re discussing, and we don’t have much shared vocabulary.

    If you’re wondering why I’m incapable of brevity in this context, you’ve just given me a couple of good reasons I haven’t been brief, and should not be brief.

    About the word “expression,” for which I was substituting “transcription”—doh! I knew that. I’ve used that word before, talking about this stuff, and used it correctly. I don’t know why, this time around, the word “expression” didn’t come to mind, and “transcription” did, except to say it’s technical jargon—the terms could be reversed and make almost as much sense—and that I’m not a fluent speaker of that jargon. I know the relevant concepts (so far), and I know the words, but I don’t always use the right words for the right concepts.

    One reason I was verbose in earlier posts was precisely to nip such misunderstandings in the bud—if I describe the process of using a gene to construct a protein via RNA and call the whole thing “transcription,” you can see that I’m using the wrong word and correct me, and we can establish the correct terminology, and that we share it, and move on.

    I realize that you tried to correct me before, guessing that I really meant “transcription” when I used that word, but that I was misusing “transcription factor”. One reason I often verbosely “paint a picture” in this kind of context is so that if I use one word wrong, but the others all fit together correctly, you’re more likely to be able to tell which term is the problem. That didn’t work this time, but it often does.

    As for modeling, a brief and necessarily vague description of the two kinds of models I’m mostly concerned with are

    1. analogical models, which are not necessarily analog, and usually aren’t… that’s what I’ve mostly talked about before. An example is using a binary number to represent, say, height of something. Under a weird mapping (binary number representation), a “bigger” number represents a bigger height, and some relation in the program between two numbers like that may represent a relation in the world between the two values represented by those numbers, e.g, that one thing is bigger than another thing being modeled.

    2. Fitted models (I just made that term up for this conversation. I know technical terms for more specific versions of the idea, but not the term for the very general category.)

    A fitted model implicitly models something else by being fitted to it, in ways that may only be clear when you understand its function.

    A good example of a fitted model is the control program for a kind of flagellate that swims up chemical gradients to higher concentrations of a chemical, to find food. (I’m not sure which flagellates do this, or if I’ve got this exactly right, but I don’t think that matters.

    The flagellate uses a chemical sensor to measure the approximate concentration of the relevant stuff in its environment, and remembers that “number.” Then it swims straight ahead for a while, and measures the concentration again. If the concentration is higher, it keeps swimming straight ahead. If the concentration is lower, it suddenly stops swimming, and tumbles to a random orientation. Then it repeats that whole process, swimming straight ahead and either keeping going or tumbling again.

    This very simple, routine, reflex-like activity works for seeking food, because if the flagellate is getting closer to the food, the concentration of the relevant chemical will be higher, and it will keep swimming straight ahead, and get closer and closer to the food—except that it’s unlikely to be pointed directly at the food, and it will miss the food, and keep going away from the food.

    That’s what the tumbling stop is for. When the flagellate has gone too far along its current line, the concentration will start to drop, and it needs to turn toward the food to swim closer to the food.

    That’s what the tumbling stop is for. It turns the flagellate in a random direction, which may or may not get it closer to the food, and tries swimming straight in that direction for a while. If that turns out to be a better direction, it keeps swimming straight, and if not, it tumbles to find a different random direction. On average, its bad random “guesses” cost it less than it gains from exploiting the good guesses, and it gets closer and closer to the food via some path that is just a biased random walk.

    This routine exploits a lot of regularities in the environment the flagellate lives in, and regularities in the way the flagellate itself works (e.g., that it can swim in a fairly straight line, and tumble in a fairly random way when it stops suddenly.)

    All those regularities and a very few, very appropriate reflex-like behaviors let the flagellate use a very minimal analogical model of its environment—the number representing the current concentration of the chemical it’s sniffing, and the number representing the concentration the last time it sniffed. Just by comparing those two numbers, it can tell if it’s been making progress lately, and should guess that it should keep doing what it’s been doing, or has not, and it’s time to try something else—anything else.

    There is no explicit representation of a path toward a goal, just simple interactions with the environment from which an actual path emerges, a piece at a time.

    Still, we can and should say that the flagellate has a very useful model of its environment—the one remembered value and the currently sensed value are only a couple of crude, approximate numbers, from which it extracts only a single bit of useful information each time—am I making progress or not?—but if you’re only going to have one bit of information about your situation to base a decision on, you could hardly do better.

    Much of the knowledge about how to get to food is represented “procedurally”—those two numbers and one bit wouldn’t make any useful sense, except in combination with the reactive routines that use them. The organism “knows how” to get to food, but doesn’t “know why” doing that works to get to food.

    We say that the procedural part—the decision to keep going or the decision to stop and tumble—is part of the “model” that the organism has of its environment.

    That may not be intuitive, but you should be able to see that those routines implicitly encode important information that the organism has about its environment—they implicitly say that the organism lives in an environment where these simple responses to those simple pieces of data will help keep it alive.

    That rules out the vast majority of environments, so clearly what’s encoded is at least information extracted from the environment by evolution, if not intuitively a “model of” the environment.

    Intuitively speaking, that implicit information gives the minimal analogical model a lot of leverage toward counting as “clearly a model of” something specific. The minimal model doesn’t get its specificity mainly by having rich internal structure whose details rule out mapping it onto a lot of other systems—it’s just a couple of numbers, and a simple comparison between them, and that could represent all sorts of things, in different contexts. It gets its specificity by being situated in the world in a very, very specific way that serves a very, very specific function. The scalar comparison “(NewSniffValue > OldSniffValue)?” may not seem like much of a “model” of anything, on its face, but when you see how it makes just the right decision for the organism, it’s clear that it’s clearly modeling something subtle and very specific.

    Presumably many computations done by gene expression in the cell are of that general sort, modeling some crucial aspect of the cell’s internal or external situation, and choosing to do something that works. They may not do much computation on fancy declarative representations, but they do just the right computation to take a few inputs and output what to do.

    Some biologists seem to think this kind of peculiar “fittedness” is specially “biological,” and “not like computers,” because they think of computers as executing complicated programs on complicated structures, rather than executing simple reactive routines using simple structures, to continuously interact with their environments in rich ways that depend on the structure of their environments.

    But computers can do that too, and many do. Programs for industrial process control are often very much like that—the program does simple computations that only make sense in terms of interacting with a certain environment—say, sensors and valves in an oil refinery.

    A simple example of that kind of reactive program that relies almost entirely on procedural knowledge is the control program for a Roomba-type robotic vacuum cleaner.

    Like the flagellate, the Roomba mostly wanders around at random, usually going straight until it hits a wall, then turning in a random direction. By wandering around your room (or set of connected rooms) for a good while, it will eventually clean your whole floor. Probably.

    Like the flagellate’s, the Roomba’s random walk is biased in certain simple ways that make it do a much better job than if it just followed straight lines and turned in purely random directions.

    One trick it uses is wall-following. When it hits a wall, it may wander off in a random direction, or randomly choose to follow the wall for a good while. If it’s following the wall and the wall ends at a convex corner, it tends to follow the wall around the corner sometimes.

    Following walls has two functions, one of them interesting.

    The boring function is to ensure that the area right along the wall gets cleaned; it would tend to be slighted by a purely random walk. (Because if the robot hits the wall at a random angle and goes away at a random angle, it will only contact the wall at one point, and not clean its normal swath all the way
    to the wall. Points very near the wall would tend to go uncleaned for much longer than points away from the wall.)

    The interesting reason for wall-following is to make sure that the robot tends to frequently go in and out of rooms, rather than getting trapped in one room for a long time before it finally follows some straight clear path through the doorway. That makes it more likely to clean your whole floor in a reasonable period of time, because it’s less likely to spend a lot of time trapped in a subset of the rooms, and slight the others.

    Do these examples make it clearer what I mean by “modeling and controlling” things? They tend to go together, because what makes the analogical model an analogical model of something in particular may depend on procedural knowledge that “tunes” the analogical model to a precise aspect of the thing that’s being modeled.

  55. Paul W., OM says

    BTW, I haven’t forgotten to answer the rest of your questions; I just haven’t gotten to them yet.

    Some of the RNA transcripts do things themselves, some are translated into proteins that do things, and some are junk. But are these part of the program or part of the computer?

    They’re not part of the program. The genes are the program.

    If an RNA transcript acts as a transcription factor itself (enabling or disabling expression of other genes), then it acts
    as program data.

    If it doesn’t, and acts as a template for a protein that is a transcription factor, then it is data, used by the computer, but it is not program data—it isn’t the value of a variable in the program, but is data used by the underlying machinery in computing such a value.

    That happens in (human-made, digital) computers all the time. What you see as the execution of one instruction on one or two pieces of data may be implemented in some fairly complicated ways, with several sub-computations on partial results that get combined, in the actual hardware.

    For example, in doing addition, the hardware may use carry bits, which are not part of the input, or part of the output, but are used in computing the input from the output. That’s data somewhere in the computer, but it isn’t part of the program, and isn’t the value of a variable in the program.

    RNA’s that are only used as intermediate representations in the process of gene expression are like that. You can think of gene expression as taking inputs and producing outputs.

    An RNA that is the final product of gene expression and itself acts as a transcription factor can’t be ignored in that way—it must be treated like any other transcription factor, as the value of a variable in the program.

    I don’t really see a way to separate the genome (“program”) from the rest of the cell (“computer”), and it seems that you don’t either.

    Depending on what you mean by “separate from,” I thought I’d already done it. I really don’t know what kind of “separation” you’re talking about, or why you think it’s necessary.

    I thought I’d distinguished between the gene program and the cell’s computer, and don’t know what more you want.

  56. Paul W., OM says

    johnharshman:

    (I’m wondering if I should have addressed this early on, and worked top-down in explaining how an organism is “like” a SIMD distributed computing system, rather than starting at the bottom and showing that a cell contains a literal computer…)

    To take another tack, if this cell=computer and genome=program thing is more than a vague analogy, we should be able to make some use of it. You should, for example, be able to predict some feature of the system or its behavior that we don’t already know about. Can you?

    At this point I think I can explain some known things in fairly clear programming and computer architecture terms, and I think that if what I’m saying is true, it should help make predictions.

    One of the reasons that I want to talk to biologists about things is to refine the model so that it is more explanatory and ultimately more predictive.

    Here’s an ex post facto example.

    If I look at the genome as a program running in a single cell in a production system computer, I notice that there’s something simple and crude about that production system, in programming terms.

    It’s like a production system I’d use as a teaching example, which is easy to understand and easy to emulate in a short program students could write as a project for the week.

    (And in fact, it does resemble production systems I have actually had students implement in CS 101, as toy project, before I noticed how similar gene expression is.)

    That is also the kind of thing I’d expect shortsighted evolution to stumble into and latch onto. It is as if evolution was smart enough to barely pass CS 101, but dumb enough to flunk out of CS102, and kept relying on the same crude ideas.

    It makes sense that evolution would likely stumble onto something workable for fairly simple things, and exploit it to the max, but might not find certain really useful enhancements that are not easy to implement in simple, incremental steps that are all useful in themselves.

    One of those features is recursion—as far as I know, the production system implemented by gene expression doesn’t let you define functions in terms of themselves, and compute such functions locally. It doesn’t even provide convenient forms of looping, which can be used to compute a useful subset of recursive functions. You can write crude loops, in awkward but doable ways, and that can be useful.

    Given the nature of the programming language, evolution will not be able to program the computer in certain elegant ways, and will have to resort to the very same stupid hacks that my CS101 students tend to use if they don’t understand recursion or how to use looping elegantly. They try to do too many things with nested conditionals (if-then statements), and when they implement a loop, they get the loop termination conditions wrong.

    That is precisely what you see when you look at Hox genes and the like, for segmented body plans. For example, the Hox segmentation of, say, human embryos uses awkward, ad hoc nested conditions with oddball combination of flag variables to tell cells which segment they’re part of, instead of starting from a loop and numbering the segments.

    And where you do see an actual loop, as in the generation of thoracic vertebral segments, it’s a kind of sloppy, fucked up loop that still doesn’t leave you with actually numbered vertebra that let you elegantly express exactly which vertebrae are closer to which end than which others. So in snakes, with many thoracic vertebrae, the number of vertebrae that you get in the thorax is rather variable—a timing trick is used to end the loop, rather than a counter, and while the timing trick has evolved to be remarkably precise, it’s still not very precise, as simply decrementing an integer counter would be.

    It’s something PZ has repeatedly pointed out and remarked on, and said that that evolution doesn’t do things the way a human programmer in a programming language would do them. He says computer programming is a bad metaphor for evolution.

    But he’s wrong. It is exactly what you typically see from a clueless beginning programmer and/or a programmer working in a crude, easy to implement, very limiting programming language. Teaching computer science for years and years, it’s what I always see from some bad intro students, or students who just haven’t yet learned how to use the more advanced features of programming languages, but want to hack something together and make it go, right now.

    Evolution does things very much the way ignorant, impatient students do under time pressure, and better programmers all too often do if forced to by programming language limitations and inertia. (E.g. programmers may not be allowed to radically revise the program, which may break things that currently work, even if in the long term it would make things better once you fix those bugs. But they may be allowed to hack it in few couple of places to add a new feature that the marketing department wants.)

    I’ve seen the latter over and over too—good programmers using bad languages, or prohibited from making major revisions to a program, and resorting to similar ugly hacks. It happens all over the place, because of the way large programs evolve by something depressingly analogous to natural selection, under time and marketing pressures and uncertainties of all kinds—various real-world factors that prevent people from taking the time to do things right the first time, or from backtracking very far to improve it, and encourage them to just hack the next feature in with ugly nested conditional statements, etc.

    Even at a small scale, software development resembles natural evolution. Despite being smart, people still ultimatly pick between a few likely paths at any given point, without having a good handle on what’s really likely to work well in the long run. And at a large scale, it’s much worse.

    This is a very well-known problem in large-scale software development, and provides much of the impetus for things like “scripting” programming languages embedded in application programs. Scripting languages are there largely so that various features of a single program can be elegantly written and rewritten, rather than just hacked into abominable nestings of incomprehensible conditionals with ad hoc combinations of flags and so on.

    Human software is in many ways a lot less “intelligently designed,” and more stupidly evolved, than most people realize—even most programmers who haven’t worked on a lot of large-scale software projects. Even smart human programmers frequently do the same shortsighted, stupid shit as dumb ones, when working in groups, under time pressures, with uncertainty about the long-term prospects in various sense—and that is more or less the usual case in large scale software.

    Even my own research projects have suffered from that, to a depressing extent, despite being written by just three or four very smart people working fairly closely together in pretty good programming languages.

    We often grit our teeth and hack in the next feature or two to get the next experimental result and publish the next paper, and after we’ve gotten a handful of papers out of the project, our elegantly designed program has been hacked into an embarrassingly ugly and annoying-to-deal-with mess.

    Here’s some more 20-20 hindsight about natural evolution as computer programming…

    One very interesting thing about the computer that the genome program runs on is that in macroscopic animals, it’s a massively parallel computing system, consisting of millions of loosely-coupled processors. In particular, it’s a Single Program Multiple Data (SPMD) machine, with each processor holding the same program, and able to independently execute different parts of the program.

    That has major implications for programming languages and the programming tricks that evolution can and cannot find.

    In particular, it means that even if you can’t implement general recursion locally, on a single processor, you can implement it with many communicating processors, and sometimes you can do it very elegantly, depending on the conditions under which the recursion should terminate.

    Which is exactly what we see in evo-devo, and we can even see it with the naked eye—beautiful nested branching structures in two and three dimensions, and often elegant means of generating them with simple local computation and simple inter-processor communication. Given a massive SPMD machine with no local recursion, that’s exactly what you’d expect to see—ugly local hacks where looping or recursion would be better, and often elegant distributed algorithms that do recursion in a straightforward, spatial way.

    You see close analogues of the latter in human-built SPMD machines. Intelligent humans often solve computational problems in much the same elegant ways that dumb evolution solves physical-and-computational problems.

    Consider a “supercomputer” with 4096 processors connected in a 2-dimensional 64 x 64 grid network, and physically arranged more or less that way for physical reasons. (E.g., so that processors connected to each other can always be connected with short wires, with short delay times, but the processors are not tightly packed into a small 3D grid which would be much harder to cool, because the processors near the center would tend to get cooked by heat from the processors around them.)

    Consider program that consists of several major tasks that execute in parallel and don’t need to communicate with each other quickly or in large volumes, but each subtask consists of smaller subtasks that must cooperate more closely.

    The big tasks can be far from each other in the computer without the communication delays between them being a problem.

    The subtasks of each big task should be located close to each other in the computer, so that their frequent communications with each other don’t suffer from long delays going far across the network, and don’t clog up the network all over the computer.

    So what do you do? You divide the grid into contiguous 2D regions whose processors compute the major subtasks, and if a major subtask has a similar division of labor, you subdivide that region into sub-regions that do different jobs, and so on

    In effect, you are dividing the computer into something like virtual organ systems, and if appropriate, subdividing those into organs, and maybe tissues, and maybe down to cells, if you have enough processors. (But typically you bottom out at some level, and compute all the subtasks of a task at that level on a single processor. If you had billions of processors, like a good-sized organism, you often wouldn’t need to do that, and you’d see some programs with as many levels of nested differentiation as you do in organisms.)

    Now consider programs with a uniform recursive task structure—a branching tree—where the communication patterns at each level are similar, and most of the communication happens between adjacent nodes at the “leaves” of the task tree, and less happens at each level higher than that. (There are a whole lot of parallel programs like that, for all sorts of basic transforms like like sorting and FFT’s, and all sorts of domain-specific things, too.)

    Supposing your task tree is binary, with each node above the leaves having two subtrees, how do you distribute it in a two-dimensional grid?

    Probably like this:

    Put the top node at the center of the grid.

    Divide the grid into left and right halves, and put the two next-level nodes in the upper and lower halves of each half.

    Repeat that process, recursively dividing each subregion vertically or horizontally, in alternation, until you get down to single processors. Then put all the subtasks of each task at that level on one processor.

    Now draw a pair of lines from the center of your grid to the centers of their halves, and so on, to show the hierarchical communication traffic between the regions.

    What does it look like?

    It looks like a squared-off pair of lungs.

    If we did it on a massive 3D-grid computer, it would look even more like a pair of lungs, fractal-like 3D branching and all.

    There are often close analogues between spatial arrangements of computations in a large SPMD computer and spatial arrangements of physical things in organisms.

    There are deep reasons for that.

    One basic one is that they are both actually large-scale SPMD computers, which both must solve spatial problems, which may have a regular recursive structure, or recursive levels of differentiated, or a mix of both.

    Large human-built SPMD computers often end up arraying computations in physical space in ways that mimic the leveled subdivisions of organisms, for a very fundamental reason.

    Like organisms, computers are physical objects in 3D space, with three-dimensional design constraints, which affect the performance of many programs, whether or not those programs are computing about spatially arrayed things.

    This forces programmers of large SPMD machines to solve many computational problems in much the same way nature solves physical ones—exploiting both regular and differentiated structures of problems by arraying subproblem structures in nested physical regions.

    Because computers are physical objects, limited by spatial
    constraints, high-performance computing often depends crucially and fairly directly on the physics of 3D space.

    And often that is the biggest problem in computation-intensive problem solving, dominating the basic program design. If you need enough computational power that you need a massive SPMD machine in the first place, you usually have to solve problems in ways that are analogous to recursive and/or differentiated substructures of biological organisms. You have to because computation is always a physical process happening in physical space.

    When you realize that, it shouldn’t be surprising that program structures are often eerily similar to structures in biological organism. It should only be expected.

    Computer scientists have reinvented a lot of the same solutions to physical problems that nature found hundreds of millions of years ago, because computation and biology are mainly about same thing, in a fundamental sense: how to efficiently get complicated shit done in space.

    I don’t want to try to make any particular useful prediction right now—certainly not definite ones—but I think those clear and fundamental similarities strongly suggest that computer science isn’t irrelevant to figuring out biology. They solve many similar problems in many similar ways, because they have to.

  57. johnharshman says

    Paul,

    OK, I can see that your flagellate is modeling its environment, sort of. Did you intend that merely as an example of modeling? Because it has nothing to do with the genome, except in the trivial sense that it involves proteins and other products of metabolism mediated by proteins. I can see that you might call this particular stimulus and response system a computer, though I still don’t see what you learn by doing so. And it would have been nice if your example fit your thesis about genomes, because it still isn’t clear what the genome models.

    (I will add that if you’re looking to cut superfluous material, most of your flagellate example, and all of the Roomba stuff, was pointless. You could as well just had a couple of sentences talking about responses to stimuli.)

    Now why do I resist the notion that the genome is a computer program? Well, partly from the notion that a program is executed one instruction at a time; but of course there can be massively parallel computing, so we should dispose of that one.

    It’s also from the notion that a computer and a computer program are two separate things: the program is what is executed and the computer is what executes it. I don’t think we can make that distinction with the cell, and your attempts to do so seem to have resulted in great confusion. This is why I ask what’s part of the program and what’s part of the computer, assuming that a single thing can’t be both. Your answers have been very confusing, most recently the claim that though the genes (why not the genome?) are the program, a hypothetical RNA-based transcription factor would also be part of the program. But a protein-based transcription factor would apparently not be. What the heck?

    Then again, if you want to talk about regulatory networks as programs, then you will have to include RNAs, proteins, and many other parts of a cell. The regulatory network is not contained in the genome.

    Further, you say that something that acts purely as a template isn’t a program but data; but that’s most of what the genome does, as I’ve mentioned before. I don’t know what “the value of a variable in the program” would be for this genome-program you’re talking about. Or what a variable would be, for that matter.

    You may think you’ve distinguished between the gene program and the cell’s computer, but I don’t think you have. As I’ve said, I don’t think it’s possible to separate the cell’s contents and processes into program and computer. And that’s one reason I wouldn’t call the genome a program.

    I’m also afraid I don’t immediately see anything in your response to my request for prediction that has any real explanatory power, either pre or post hoc.

    You say that there is no recursive function definition within a cell, and that makes sense, because I don’t even have an idea what, biologically or chemically, that would mean. Not only are there no recursive functions, I don’t really see any functions to point to. You mention a programming language of the genome; what is that language? I think the reason there are no loops is that loops are something you see in computer languages and the genome isn’t a computer language.

    You say that things you see in regulatory networks (again, not in the genome, but in the networks, which are interactions among many components) reminds you of the products of naive programmers working with simple languages. But that doesn’t mean these networks are programs.

    Any similarity between the process of evolution and the work of computer programmers is also not an argument that the genome is a program. An analogy between evolution and programming is not an analogy, much less an identity, between the genome and a program. That’s a confusion of levels.

    If you show that the cell is a computer (at least in some aspects), you still have not shown that the genome is a computer program, which was your initial task. Nor do I find your argument compelling that the spatial arrangements of SPMD computers and biological organ systems are similar because they both happen in 3D space; while it’s true that certain physical problems are in common and both have similar solutions, this is not at all relevant to your central claim. The same is true of, for example, large buildings, and large buildings are not computers.

    In short, I don’t think you have addressed your central assertion at all so far, and none of your postdictions are relevant.

    So, what can you do? I would return to what would now consider the most immediately important questions:

    1. What parts of a cell are the program? What parts are the computer? Can cellular parts be separated into two classes such that they are mutually exclusive? (We may allow that some parts fit into neither class, but ignore those for the moment.) If the computer and program are not separate entities, is this a problem for your thesis?

    (Note that for simplicity I am sticking to individual cells. You might even restrict yourself to single-celled bacteria for simplicity, assuming you think that bacterial genomes are computer programs too.)

    2. In deciding that the genome is a computer program, what do we gain? Does it lead us to greater understanding, and if so how and of what? What can we do with this understanding? What predictions, if any, can we make on this basis?

  58. Paul W., OM says

    johnharshman,

    This is pretty frustrating.

    When I explained the flagellate and roomba, I was trying to clarify something you asked about, and preempt some common, naive misunderstandings which you seem to exhibit.

    I was giving basic examples of “modeling and controlling,” or “monitoring and controlling,” because you asked, and to illustrate several things you don’t seem to get.

    Stop with the telling me I can cut “irrelevant” stuff that in fact answers your questions, or illuminates crucial issues you misunderstand.

    Stop with the telling me what isn’t like a computer programming language, or isn’t like a computer. I’m a programmming language designer and implementer, and sometime hardware designer. You likely have some software and hardware I designed in your pocket right now, if you have your pants on, and if not, it’s probably on your desk.

    Please try to give me a little benefit of the doubt, e.g. that maybe I have a pretty good idea what is or isn’t a programming language, and that maybe my analogy to carry bits already addressed the issues you raised about intermediate RNA transcripts and what is or isn’t “part of the program” or “part of the computer” in a way I didn’t spell out enough because you keep telling me not to be verbose.

    I may be misunderstanding you, but you seem to be unwilling to accept that a biological computer is really a computer unless it has certain characteristics that are in fact unnecessary for being a computer, and even some that no physical computer has or could have .

    I can’t help you there. I guess computers just don’t exist.

  59. johnharshman says

    Paul,

    I don’t think I misunderstand the things you think I misunderstand. If you don’t want to respond to me, don’t respond. If you do, try to answer the questions I ask. I’m sure you know what is and isn’t a programming language, but you need to communicate that to me.

    I’m not sure what characteristics you mean here:

    you seem to be unwilling to accept that a biological computer is really a computer unless it has certain characteristics that are in fact unnecessary for being a computer, and even some that no physical computer has or could have .

    Pretty sure computers do exist. But is the genome a computer program? That’s the question.

  60. Paul W., OM says

    johnharshman:

    One of the things that you seem to be unwilling to accept in a biological computer is that some things involved in the computation are neither program nor computer, e.g., you think it’s a problem if RNA isn’t part of the program, and isn’t part of the computer. You act like that’s a crucial, killer question, which just reveals that you don’t know what a computer is and have no idea how it works.

    You seem to think that RNA being neither program nor computer would be somehow not like a real computer, but you are mistaken. It’s completely irrelevant.

    Before I explain that, let me address the issue of whether the genes themselves are “part of the program,” or “part of the computer.” That may let you clarify your weird questions about “separating the program from the computer,” which in CS terms, I’ve already done.

    The genes are the program, and while we’d normally say they are not part of the computer, but are inside it, there’s also a perfectly reasonable and clear physical sense in which they are of course, obviously part of the computer, too—they are certainly physical parts of the physical mechanism that does the computing, which includes the rest of the computer operating on the program instructions.

    Genes made of DNA are “part of the computer” in that basic physical sense, and “not part of it” in the usual sense.

    Consider a program represented as a deck of punched cards—like genes, that’s a collection of discrete, structured physical objects whose structures encode an instruction apiece—executing on a mechanical computer that executes the punch card program directly. (There have actually been such machines. I could make one for a forward-chaining production system, if I wanted to go to the trouble. That would be a fun hack, and wouldn’t be hard, but it would take a while to build and be slow.)

    In case you don’t know how punch cards work, The holes in the cards are detected by metal pins that either do go though the card (if there’s a hole) or don’t (if there’s not); and which combinations of pins go through when the card is “read” mechanically determines what operation will be performed.

    Is the deck of punch cards “part of” the computer? Clearly yes in a basic physical sense—it’s a key part of the overall computing mechanism. The cards move through the computer, and get in the way of pins, and are mechanical moving parts just like the pins and gears and pushrods and whatever that execute the instructions.

    But normally we say “no,” it’s “not part of the computer” because we call the relatively fixed part of the computing machinery “the computer” and the easily changed part (the card deck) “the program.” We just do that to emphasize the unique role of the program, not because it’s physically true.

    Genes are like that, too. We could say they’re part of the intra-cellular computer, but we normally wouldn’t. We’d call them the program, and the (rest of the) gene expression machinery the computer.

    There is no problem there, any more than there is in a punch-card controlled machine, or in any other programmed computer anybody ever built. All programmed computers represent the program physically, and so in a basic physical sense it’s always part of the machine.

    We just choose to think of it as separate from (the rest of) the machine, to emphasize that it’s the relatively changeable part, and the rest of the computer has a relatively fixed configuration.

    It’s just no big deal. It’s absolutely normal.

    Now for your question about RNA transcripts, which apparently seems very important to you in some way that baffles me.

    RNA transcripts are transient copies of parts of instructions, i.e., the “action” part of a condition-action production rule.

    As such, they are neither part of the program nor part of the computer, in the sense that the program is “not part of the computer.” They are not part of the relatively fixed mechanism, and are associated with the program. But they’re even less fixed than the program—they’re transient data used by the hardware during the execution of the program.

    You seem to think its weird, and the the whole program/computer idea falls apart, if there’s something in there that’s neither part of the program nor part of the computer.

    Again, if you think that, it’s only because you don’t know anything about computers if you think that. There’s lots of stuff like that in computers, under the hood.

    For example, it’s absolutely normal in a normal von Neumann computer for an instruction to be “decoded”—translated into a slightly different form before execution, and just like the RNA copy of the action part of a gene, the alternative form contains the same information, just in a format that’s easier for the rest of the machinery to work with. There’s a special piece of hardware (which I’ve mentioned before) called the “instruction decode unit,” whose job is to do precisely that.

    Translating instructions into an equivalent, easier-to-execute form before actually executing them isn’t unlike a computer at all—it is exactly like a computer. The presence of decoded (transcribed) instructions somewhere inside the computer isn’t a problem at all.

    And the same thing going on in a gene-expressing computer just makes my story of the genome as a program that much better.

  61. johnharshman says

    Paul,

    One of the things that you seem to be unwilling to accept in a biological computer is that some things involved in the computation are neither program nor computer, e.g., you think it’s a problem if RNA isn’t part of the program, and isn’t part of the computer. You act like that’s a crucial, killer question, which just reveals that you don’t know what a computer is and have no idea how it works.

    No, I have never said any such thing, and in fact have allowed for the possibility that there are bits that are neither computer nor program. What I’m actually wondering about is something quite different, which is whether we can separate the parts that are computer from the parts that are program, and if we can’t whether that’s a problem. You are seriously misreading me here.

    None of what you say in this latest post is relevant to any issue actually between us. You are raising non-issues and ignoring the real ones, if there are any real ones.

    As for what’s the program and what’s the computer, apparently only the genome itself is the program, and that transcription factors are not part of the program, but are program data, which isn’t part of the program. Is program data not part of the computer either? What else would constitute program data? Signalling molecules that aren’t transcription factors?

    One source of my confusion might be your flagellate example, in which you describe its behavior as a program, despite the fact that the genome isn’t involved directly. Perhaps you mean that there’s more than one program in a cell. If the flagellate’s behavior is a program, what makes up that program?

  62. Paul W., OM says

    2. In deciding that the genome is a computer program, what do we gain?

    If it’s true, at a bare minimum we get the truth.

    A lot of biologists are just flat wrong about this, and IMO being wrong is bad. Biologists shouldn’t go around saying that the genome isn’t like a computer program if it actually is one, should they?

    Can’t we at least agree on that?

    Beyond that, even if somehow it didn’t yield any useful new insights, it’s a fucking cool truth.

    Evolution invented production systems before mathematicians did.

    Evolution invented automatic computing.

    Evolution invented programmed automatic computing.

    We can see how many specific things humans have invented for computing correspond interestingly to things evolution invented for keeping us alive.

    Evolution is pretty fucking cool if it can do that, isn’t it?

    Not calling things like computers by their right names, and even denying that the names apply… well, that sucks, doesn’t it?

    Imagine that biologists objected to physicists saying that organisms were physical systems, and said that no, they’re not much like physical systems at all.

    Beyond that, even if we didn’t learn anything different by calling natural biological computation computation, we ought to call a spade a spade and a shovel a shovel, or biologists will inevitably reinvent a whole lot of basic terminology in order to describe a computer and its computations in “non-computery” terms.

    For example, I can talk about gene expression using a bunch of standard computer science terms, all precise and at exactly the right level of abstraction to talk about computation as computation, rather than mixing up levels of abstraction in a confusing way, as usually happens when people cobble up new terminology for old ideas in a domain-specific way.

    Mathematicians and computer scientists have very good and clear terms for a whole lot of things that come up in understanding the operation of the genome: (1) productions (rules), 2. working memory of a production system, (3) conditions on the (4) left hand side of a production, the (5) action on the (6) right hand side of a production, (6) matching of conditions to enable the (7) firing of a rule, (8) asynchronously and (9)in parallel according to a (10) stochastic (11) schedule that is (12) probabilistically approximately fair.

    And we have names for things like (13)massively parallel and (14) distributed (15) Single Program (16) Multiple Data (17) programs running on (18)local (19) processors that execute productions (20) concurrently by default, and with with (21) high but limited local parallelism, and communicate via both (22) nearest neighbor messaging and (23) broadcast messaging.

    We also have names for things like (24) analog computing devices used for both (25) analog and (26) digital computation in (27) hybrid computers, and (28) stochastic discrete rule firing in the machine language at a short (29)timescale and a low (30)level of abstraction to implement (31) noisy analog (32)scalar (33)program variable values at a larger timescale, and often using (34)feedforward and (35)feedback between rules to make variable values (36) and rule firing rates (37) bistable, so that they function as (38)boolean-valued variables and boolean logic at a larger timescale and higher level of abstraction.

    I could go on and on, and list about 100 relevant terms from math and computer science as fast as I can type them, and that’s without even starting on terms from software engineering, signal detection theory, process control theory, graph theory, and several other related fields—each of which I could reel off at least a couple of dozen more good basic terms from, that will inevitably be useful in describing how the genome works at various levels of abstraction and functional organization.

    Biologists are inevitably going to have to reinvent all that relevant shit to properly and fully describe the genome and how it works—stuff that already has well-thought-out basic theories, and lots of sane, appropriately abstract domain-independent terminology, in several closely related fields that do talk to each other and often agree on terms.

    Does it lead us to greater understanding, and if so how and of what?

    I think it does, but for now, just let me ask you this:

    If I’m right and all those standard terms above apply to genomics and evo-devo, and hundreds of others I didn’t list—do you seriously think that none of the actual theories those terms embody are going to be usefully applicable, and offer some insights into biology. Seriously?

    Imagine somebody inventing their own mathematics-like theory to describe what they’re doing, reinventing and renaming things like algebra and derivatives and integrals and such… and eventually having to develop theory of things like ODE’s or groups and rings.

    Wouldn’t you think there was something pretty fucked up about any field where people were so insistent on reinventing basic, generally useful concepts?

    Don’t you think maybe they ought to ask for a little help from people like, say, mathematicians?

    What do you think mathematicians would think of them, going off and doing everything their own way?

    Well, that’s about how I feel when I see biologists trying to make sense of the genome.

    Ya know, we’ve kinda been there and done some of that, and written books about it and things like that.

    Why waste that, and get all cowboy DIY about it?

    That’s not how science is supposed to work, is it?

    Like a mathematician watching scientists reinventing standard mathematical concepts and terminology, I may not know exactly how standard mathematics will apply to help with your domain-specific problems, in detail, but I can pretty well promise you, it’s going to be useful, somehow. Likely very useful.

    And actually, I think I have some reasonably clear ideas about some general approaches to figuring out the genome’s operation that probably will be pretty useful, with varying degrees of customization for the peculiarities of biological computing.

    But if I told you about them, you wouldn’t understand them, because you don’t already know the computer science.

    And clearly, given how hyperskeptical you are of these ideas, you’d just tell me I’m wrong, without knowing what you’re talking about, as you’ve been doing.

    So fuck that.

    If you don’t think the clear and striking postidictions I’ve described are promising suggestions that working this stuff out in more detail would likely result in useful predictions down the road, then you have a bigger problem than not understanding the relevant computer science.

    You don’t seem to understand science, or seem unable to in this particular context, for some reason.

    Postdiction and basic explanation generally precede prediction in the scientific process. How many clear, useful predictions did Charles Darwin make in the Origin?

    I’m no Darwin, but this is kinda like that.

    Even if the “analogy” to a SIMD computer was “just an analogy,” it seems to me a clearly interesting analogy, worth exploring. Even if somehow the “local processor” that does gene expression was somehow not literally computer, the SIMD-style of coordination of events—whether you call them computational events or not—would likely make a productive analogy.

    Humans programming parallel and distributed computers have come up with a lot of techniques for organizing complicated shit distributed across a large number of concurrent and physically arrayed units. Many of those techniques are actually simple, and rely on bottom-up emergence of organization from simple local actions—the kind of thing evolution is good at inventing too.

    Even if that were just “an analogy,” you should at least not be so hyperskeptical about whether it’s likely to be a useful one. Techniques for engineering and analyzing large collections of concurrent, communicating units are likely to be useful for both.

  63. Paul W., OM says

    johnharshman:

    As for what’s the program and what’s the computer, apparently only the genome itself is the program,

    Yes. I’ve said that over and over.

    and that transcription factors are not part of the program, but are program data, which isn’t part of the program.

    Yes. (Assuming that by “transcription factor” we mean a product of gene expression that is used to control other genes by docking enable or disabling sites in control regions.)

    Is program data not part of the computer either?

    Right, at least in the sense that the program isn’t part of the computer. We normally distinguish between the program, the program data that it operates on, and the computer. But of course in basic physical fact, they’re all part of the computer, somehow.

    When we start talking about the hardware, though, there’s another kind of data, used by the hardware on behalf of the program, but not part of the program.

    The values of carry bits in addition, for example, do not show up in the program, and computer architects may use carry bits to implement addition, or something else. (E.g., table lookups in a small super-fast memory unit for short numbers, or fast “parallel prefix” tree-structured hardware schemes that use more intermediate bits than simple carries, but don’t have to wait for the carries to propagate all the way up all the places.)

    Those partial intermediate results are below the level of abstraction of the program, even a machine code program. They are not supposed to affect the course of the computation, except of course by helping addition get computed correctly, so that the right number shows up as the value of a program variable.

    When you transcribe DNA to RNA, and only use the RNA to generate a protein, the RNA is like those carry bits or whatever—it doesn’t affect the course of the computation, e.g., which instructions get executed, except of course to help generate the right protein.

    An RNA that serves a computational function directly, by acting as a transcription factor itself, is a different matter. It does show up at the program level, because its concentration is a program variable, just like the concentration of a protein transcription factor.

    What else would constitute program data? Signalling molecules that aren’t transcription factors?

    Yes. For example hormones that can drift out of the cell into the bloodstream act as “broadcast” (one-to-many) messages in our SIMD machine. Other chemicals that signal adjacent cells, like morphogens, act as “nearest neighbor” messages in our SIMD machine. That’s just what you’d expect in a SIMD machine.

    One source of my confusion might be your flagellate example, in which you describe its behavior as a program, despite the fact that the genome isn’t involved directly.

    I was just using it as an example of an interesting modeling and control computation, which illustrates several things. I was not claiming that the computation is done directly by the genes, and wouldn’t be surprised if it’s done by a hybrid (analog and digital) circuit of some sort built by the genes, and able to function autonomously in real time.

    I assume that a lot of that goes on in the cell, like an old GM factory built before digital computers were cheap, with a lot of the machinery run by fixed analog or hybrid circuitry of various sort (electrical, mechanical, or hydraulic), and all of those things more or less controlled by a central programmed computer.

    Perhaps you mean that there’s more than one program in a cell. If the flagellate’s behavior is a program, what makes up that program?

    So far as I know, there is one main programmed computer in the cell, i.e., “the” computer with “the” cell’s genes, and a number of miscellaneous non-programmed analog or hybrid computers monitoring and controlling various chemical processes and nanomachinery.

    There is at least one other kind of programmmed computer in there, though—the gene expression computer in the mitochondria, running (expressing) the mitochondria’s DNA.

    If there are any other organelles in the cell with their own genes, they’d count as programmed, too.

    (And in principle, the genes don’t have to be DNA; they could be RNA if you built a programmed computer using RNA to represent instructions. So far as I know, there’s no such computer in the cell, but it would be cool to find out there is—especially if it was independently programmed like the mitochondrial computer.)

    The flagellate example was meant as a simple example of something that counts as computing—remembering one scalar value, comparing it to a new one, and switching from one mode of operation to another, or not, depending ont he result of that comparision.

    No matter how the flagellate does it—with genes, or with a fixed analog circuit that is only bistable and thus boolean at
    the highest level, it’s still a computation, and it’s the same computation in a very basic sense.

    Many biologists really don’t seem to get what computation is. It’s stuff just like that. If something’s function is to to process information-bearing inputs and yield an information-bearing output, and you use mechanical processes in between that model something important about the domain of operation to come up with the right output, it is a computing device and what it’s doing is exactly computing.

    Another reason I like the flagellate example is that the behavior of the flagellate is radically dependent on interaction with its environment, and very little environmental structure is directly modeled in the computer.

    The flagellate doesn’t make a multi-step plan, top-down, and maintain representations of its goal state, the path it’s chosen to the goal, and so on.

    To a lot of biologists, just like a lot of IDiots, that doesn’t seem like a computation at all. They think computation is complicated and organized and executed top-down like a structured program.

    But computation is generally not like that.

    Even programs written in a top-down fashion in a structured programming language are mostly not like that, if you ‘re looking at the hardware operations. They may be designed top-down, and even execute top down in some sense (e.g., with nested procedure calls), but in more important senses, all programs run bottom up, with the illusion of high-level control emerging from the interactions of low-level parts.

    That is especially true at very low levels, within a hardware processor, and at high levels in a distributed processing system—both of which have very strong analogues in biology.

    Biology isn’t unlike computing because it’s mostly bottom-up. It is very like computing because it’s mostly bottom-up.

    The Roomba example is interesting for reasons, especially for dispelling related misconceptions on the part of both biologists and IDiots, but also in some ways I haven’t pointed out—including one that is an answer to your question about why I don’t just talk about stimulus-response mappings.

    I have a good answer to that question, and did before you asked, because it’s an important question for illuminating what it means for something to literally be a computation, or to literally be a computer, and ultimately, why all this stuff should be discussed in literal CS terms. (It’s straight of CS Theory 1, before you get to production system and Turing machines.)

    If nothing else, it shows that wildly environment-dependent, reactive, bottom-up emergent behavior like the flagellate’s can be controlled by something that is obviously just a computation, because I can describe the simple program running on something that everyone agrees is a computer.

    Then, if nothing else, people should see that whatever’s going on in the flagellate is very, very similar—it’s something at the very least peculiarly like simple computation, if not “literally” computation.

    In fact, anybody should be able to see that if my description of the flagellate’s stimulus-response mapping is correct, and if its internal decision-making mechanism works they way say—with the two scalars and one comparison—then it isn’t just implementing the same input-output mapping, or stimulus-response mapping.

    It’s actually computing the input-output function in the same way, at a certain level of abstraction—there’s some aspect of something in the flagellate that “remembers” the earlier chemical concentration, something that inputs the new concentration, and something that compares the two to trigger one or the other events.

    Now I can put on my CS professor hat and tell you something definitive.

    In CS, we say that if you can find any level of abstraction in the real system where you can find such an input-output mapping, and you can find the right causal structure at that level of abstraction, such that actual events that count as the inputs actually cause actual events that count as the outputs, then several things are true:

    1. The physical process you’ve identified is formally equivalent to a computation in a computer.

    2. Given the actual causal power in the system, such formal equivalence means that the physical process isn’t just equivalent to an analogous process in a computer…

    It literally is the same computation, and it really is being computed, and whatever causal mechanisms do it in the real system really are a computing device that computes that function according to that computation.

    That’s literally all it means to literally be a computation, or to literally be a computing device.

    I left out a couple of constraints. One is that the system be stable, in some important sense at some important timescale over which you want to consider it to be a computing device. It’s not sufficient if there happens to be some process that is analogous to that comparing-two-scalars-outputting-a-boolean in a one-off way. The system in question has to be stable in such a way that it can wait for input, and then react in the right way to generate the output.

    But biological “computers” have that too. That’s what something being alive is largely about—maintaining a stable enough organization that the system can react to internal and external events appropriately, and keep on doing so for a good long while.

    Essential aspects of being “alive”—homeostasis, healing, etc.—are there largely so that living things can compute the right stimulus-response mappings in the right regular ways… and keep on doing so for a good long time.

    Life is literally largely a matter of computing the right internal and external behaviors, and maintaining and protecting the stability of the computer(s), so that it can keep computing those life-preserving functions.

    Without literal computation, there would be and could be no life. Life is intrinsically computational. If it wasn’t largely computational, in the precise technical sense above, it wouldn’t be life, either.

    By the way, I’m far from the first to say that.

    One of the smartest people who ever lived, John von Neumann, famously said it decades and decades ago.

    von Neumann was not distracted by the dissimilarities between biological processes and von Neuman computations. He certainly knew that there are very many and very varied ways for all kinds of physical systems to literally compute, and to be computers, and that von Neumann machines were no more “computerish” than any of the others.

    IIRC, von Neumann also said that the genome was evidently literally a computer program for such a literal biological computer, literally programmed to perpetuate itself. (Though so far as I know, he never understood genetics and gene expression well enough to recognize gene expression as specifically rule firing in a stochastic production system computer. I think he simply died too young, of cancer, in his 40’s or early 50’s.)

    A lot of biologists are willing to admit that you likely could simulate gene expression on a computer, and maybe you could model it as a production system, but think there’s a basic difference between being able to model one thing as the other, and the one thing really being the same as the other.

    The thing is, when it comes to computers, that’s just flat false.

    Being “a working model of” a computer is sufficient for literally being a computer.

    And the same goes for being specifically a production system, or a programmable production stem.

    If you can make a working model of gene expression as a production system—even in software on a “normal” computer—and then program it accordingly, by changing genes and whatnot, then you know a couple of important things:

    Your software simulation of gene expression literally is a programmable production system as well, irrespective of it being implemented in software on another computer, and

    The biological system it models is literally a programmable production system too.

    That is a deep and utterly basic point about real computers that very few biologists understand.

    Once you establish that formal equivalence between two stable causal systems, and one of them is known to be a computer (or specifically a production system computer, or a programmable computer, or whatever), then so is the other, really really, and that is simply the end of the discussion.

    At that point, question is utterly settled, and anybody who thinks otherwise simply doesn’t know what it means to literally be any of those things.

    Biologists generally expect there to be something else to really being a computer, but there just isn’t. That’s all there is to it, so in many ways, it’s not nearly as radical a claim as it sounds.

    Everybody knows that organisms use mechanisms to map stimuli to responses, and people generally get the idea that the mechanisms that do that are doing something “like” information processing—filtering inputs to extract relevant information, using them to pick appropriate responses, and so on.

    And everybody knows that you can “reprogram” the cellular “computer” by genetic engineering, in an intuitive sense—genes control the operation of other genes, like a network of switches or analog devices and all that, and you can just edit the text-like representation to change those control relations, to change things in principled ways. (In principle, anyhow.)

    What people don’t seem to realize is that they should drop all those annoying scare quotes, because they’re actually dealing with the real thing.