The chirality issue

Note: the second figure has been replaced with the correct one, twice. A xylulose bond was in the wrong direction. This is hard.

There are a couple of other origin of life issues that I haven’t focused on yet. One that I was hoping would reveal itself is the issue of “chirality”, or the side of a molecule a bond or atom is on relative to the rest of the molecule in 3D. The strict definition includes things with identical chemical formulas having non-superimposable structures.
I don’t tend to draw in 3D to simplify things. I also don’t include hydrogen to simplify things.

I meant to address this in a previous post where I posted this figure of thiamine and parts of the pentose phosphate pathway. It’s why there is a single hydroxal on xylulose-5P drawn in 3D as if it’s coming at you out of the page. I’ll get to that at the end.

I’ve drawn ribose and xylulose in 3D. Both looking down the length of the molecule and from the side. Carbon actually binds 4 things and is tetrahedral like a pyrimid. So I put the hydrogens back into the upper structures and used conventional dashed and filled wedges to represent bonds going into and out of the plane of the page respectively.

Each of those hydroxals could be where the hydrogen is on the carbon instead. If you flipped the hydrogen and hydroxyl on ribose you can see how the formula would be the same but the structures would not be superimposeable. Things on different sides are designated with R/S or D/L distinctions that refer to single atoms on the structure. D-ribose refers to the stereochemistry around the atom farthest from the aldehyde group in the form of ribose used by life.

Amino acids are all L- relative to the side chains aside from glycine which has no atoms with stereochemistry.

So why is life one way and not the other when it comes to these bond directions? I don’t know but I noticed that the only sugar with a bond in the other conformation is xylulose-5P so I drew that one bond with 3D on the thiamine figure. The rest are in the other confirmation. And xylulose is the donor for the pentose phosphate pathway 2C molecules. This isn’t a solution to the chirality problem but it is a clue.

Ancestral fragment?

This post is going to look at something I found from the results of 2 studies looking at the ribosome, the molecular machine that translates the genetic code into a string of amino acids that fold into the 20,000+ proteins that build and make us.

The first paper Evolution of the ribosome at atomic resolution peels the large subunit ribosome apart stem-loop by stem loop. They have a second paper with the small subunit which I took some figures from. A stem loop is a loop shaped structure where you have a base paired nucleotide “stem” and above that the rest of the structure forms a “loop”, often with more stem parts. These are how ribosomes tend to evolve. Sequence changes have more of a negative effect than insertion of something that adds new functions.

In the top portion of the figure from their second paper they compare a similar region of ribosomal RNA (rRNA) from different species where you can see how new loops are added.

They went through the ribosome and figured out what stem loops were added where and peeled additions away down to a first rRNA “hairpin”, named “AES1” for “ancestral expansion segment 1” in capital letters for the large subunit and lower case letters for the small subunit. I’m interested in the large subunit because it older. More recent segments are AES2…AES3…

This second figure shows the base pairing in the large subunit. The top is numbered by their expansion segments and the bottom is colored by 6 phases of evolution they identify. The part I’m interested in is #1 in the upper right by the “PTC” in the lower right.

AES1 is associated with the Peptidyl Transferase Center (PTC). The region that does the actual connection between one amino acid and another in creating a new protein.

The second paper The ribosome as a missing link in the evolution of life looked at the sequence of rRNA and did a search of other genes to look for similarities and found some very interesting things. I should emphasize that the rRNA is not transcribed into products as far as is known. But that may not have always been the case.

They find the transfer RNAs, tRNAs, that escort the amino acids to the ribosome. They find lots of parts of ribosome and RNA related proteins. They find ribosomal proteins, polymerases, and more.

From these data sets I identified the sequences associated with the oldest fragment of the large subunit, AES1, and went and looked at what was in it.

I found parts of 5 things:
1) GshA Glutamate–cysteine ligase
2) Protease 4
3) RecF a DNA replication and repair protein
4) PqqL a probable zinc protease
5) RplB/Ribosomal large subunit protein 2

Things that attach and cut amino acids, proteases. Interesting. A protein that binds single stranded DNA during repair and replication, RecF. And one of the first ribosomal proteins to bind the large subunit rRNA, one that accesses the PTC during translation, RplB. Very interesting.

Again but better.

I’m not really functional for human behavior related posting, but thinking about the molecules is a hobby and I’ve been able to improve on the drawing I did in my previous post. I actually might have enough origin of life material for 4 or 5 posts. This post will be limited to the overall patterns in the drawing and the most interesting thing I’ve found in the subdomains so far. All of these posts will be like collecting puzzle pieces to the origin of life.

After more time with my drawing I’ve decided that it’s better described as a metabolic alignment of everything related to DNA that I can fit into a canvas. The file is too large for me to post on WordPress so I linked a Google documents version again. To properly figure out biological origins the sequences of genes and proteins aren’t enough. The very metabolism of these things should be considered. I’ve been careful about the structure of the alignment. Even now I’m thinking about how it could be improved with more room or simpler representation. There’s only so much I can do to compact the information down.
It’s a complex set of interconnected notes useful to anyone interested in the origin of life. This one is even cleaned up and error checked.

Cells are relationships between nucleic acids and proteins. Ribosomes and a genome. They make everything between them with the help of some atoms other than CHONPS. That is complex, but not endless. Ribose, 5 bases (A G U C dT), and 20 amino acids.

There is a legend in the top left of the linked drawing. It shows that each molecule is positioned with the protein(s) that use it. So the molecule is the product of something tangential to it shown by arrows, and the MAIN product of the reaction carried out by the protein(s) is also tangential.
Positioned right to above the molecule are other reactants and catalysts, and positioned left to below are products of the reaction.
The name of the protein(s) is given with the length in amino acids. Below that is every identified subdomain in the protein in order, and the protein domain superfamilies they belong to according to the Evolutionary Classification Of protein Domains database (ECOD).
Occasionally the molecule is below a protein when it’s a product of something already present on the figure nearby, like 10- formyl tetrahydrofolate (thought to be a storage form of formyl groups).

There is also a list of molecules that make an appearance in the top right. From 1 to 5 carbons and larger molecules.

In addition to the Purines (A G) and Pyrimidines (U C dT) are:
The amino acids Histidine, Methionine, Phenylalanine, Tyrosine, and Tryptophan.
The cofactors Thiamine (vitamin B1), Flavin (vitamin B2), Coenzyme-A (vitamin B5), the vitamin B6 group (pyridoxal…), Niacin, Folate, S-Adenosyl-Methionine and Molybdenum Cofactor.
These are boxed in blue for the amino acid product or first usable form of cofactor. Molecules that follow this box are different forms of the cofactor used by proteins made from the basic form (folate, molybdenum cofactor, vitamin B6, flavin too but I haven’t included FAD yet).

A cofactor, or coenzyme is a molecule that participates in what the protein is doing. The protein is just the polypeptide part. You can think of them as tools that often have to be reloaded when they dispense things. ATP itself is a cofactor as “energy currency”. The amino acid glutamate is a cofactor for dispensing ammonia. Cofactors also act as platforms to build things (coenzyme-A), or move things to other things (Thiamine), or move electrons around with charge so chemistry can happen (vitamin B6).

Overall Patterns

The whole thing is roughly centered on the purine (A G) pathway going from left to right and bent downward halfway through. Ribose-5P sits in the upper left corner in a small red box. The purine pathway is boxed in red as well. The products are marked with a large black A and G.

The pyrimidines are lined up and go down where the purines bend down. This is deliberate because parts of both pathways involving the amino acid aspartate and bicarbonate (HCO3) are close to one another. Large black letters mark the products similarity to the purines. From there continuing to the right I have niacin, methionine (and SAM), and coenzyme-A allowing a similar grouping of aspartate in those pathways.

Thiamine is above the purines and I’ve lined its glycine up with the one in the purine pathway.

Under ribose-5P and traveling downward are the vitamin B6 group and “aromatics” which are tyrosine, phenylalanine, tryptophan, and part of folate called PABA. All from the same pathway. The B6 group uses ribose-5P and the aromatics use erythrose-4P so they made sense together and near ribose. 5C sugar, 4C sugar…

Histidine, folate, flavin, and molybdenum cofactor branch off from the ends of the purine pathway where A and G separate. They are made from A and G, where many of the previous molecules have A attached as part of the structure.
This whole section has the feelings of “partly deconstructed purines” to it, and deconstructed ribose in the case of tryptophan.

On the whole there is a feeling of there being an aspartate accumulation after early purine evolution (if the pathways can be read like that). Maybe aspartate was the first nitrogen delivery molecule. That it’s a part of the purine pathway, used for so many cofactors and amino acids, and delivers purine nitrogen is significant.

Purines and Pyrimidines, General Patterns.

To simplify discussion of some of this I drew a separate purine and pyrimidine pathway that just focuses of the overall patterns.

A simple alignment of the purine (bottom 3/4) and pyrimidine (top 1/4) pathways with gained and lost molecules included.

The purine pathway that makes Adenosine monophosphate and Guanosine monophosphate is unique in that it is constructed from smaller parts relative to all of the other pathways presented, outside of the ribose-5P. The biggest thing is glycine, the smallest amino acid, in the formation of the smaller 5-membered ring with ammonia and formate.

Aspartate, the amino acid used in making pyrimidines, temporarily joins in making the second ring. It attaches by it’s nitrogen and leaves as fumarate (tricarboxylic acid cycle). Bicarbonate (or carbon dioxide in later organisms) is the only other thing in the pathway left to mention. And the way it’s attached to the former glycine makes a 3-carbon segment. It’s a 2C-3C transition. Maybe it means something. Carbon fixation would be different things at different metabolic levels.

The pyrimidines have the opposite relationship to ribose-P as the purines. The ring is made first and then it is mounted on the ribose via PRPP, the form of ribose used to make nucleotides (and niacin, tryptophan, folate…). A bicarbonate and aspartate are combined and made into a 6-membered ring like the second ring of the purines. Getting aspartate and bicarbonate together in the figure makes sense when it comes to positioning the pathways together.

Both pathways make an intermediate that isn’t used for anything else. Inosine monophosphate and orotidine monophosphate. After that it’s smaller changes to get each of the final nucleotides.

The pyrimidines also lose parts of structure. Niacin removes 2 hydrogens and their electrons making a double bond. After orotate binds ribose it loses a carbon dioxide (decarboxylation). And the only place DeoxyriboNucleic Acid shows up is deoxythimidine, a loss of a hydroxal. That’s done with purines too but I didn’t need to show deoxyribonucleic acid until dT.

Maybe there’s a progression here with 5 ring ancestors and a 6 ring newcomer. Early base pairing between them. I certainly think accumulation of aspartate is a theme in both the evolution of pyrimidines and much more. In fact I think modern genomes and ribosomes are descended from something like the 5-membered thiazole in thiamine on a ribose maybe. With chemistry and base pairing between strands.

The purine pathway splits to thiamine biosynthesis at AIR, but the molecule is turned into the 6-membered ring of HMP. Not the 5-membered ring that moves carbon in making ribose (pentose phosphate pathway), the part that assists in the chemistry (more on that at the end)

Purines

I think there is a level where one can tentatively delete whole parts of proteins when thinking about precellular life in the vent. Formate, methane, bicarbonate, acetate, ammonia, phosphate were just tentatively available. There is sufficient literature support for these molecules in the vent from my point of view. As well as acetate, pyruvate, and fatty acids that make membranes.

Extreme accumulation of ammonia on electroreduced mackinawite: An abiotic ammonia storage mechanism in early ocean hydrothermal systems

Generation of long-chain fatty acids by hydrogen-driven bicarbonate reduction in ancient alkaline hydrothermal vents

Bio-inspired CO2 conversion by iron sulfide catalysts under sustainable conditions

Phosphate availability and implications for life on ocean worlds

I’ll only mention the most interesting thing I have found in the protein subdomains in this post.

Focusing on the purine pathway there is another maybe link to the origin of life. 3 polymers are important in life and origins: polynucleotides (DNA RNA), polypeptides (proteins), and fatty acids (~16-20 carbons, membranes and cofactors like biotin). Not only are nucleotides being made but step 2 of the purine pathway has a product, phospho-ribosyl-glycin-amide, that looks like a peptide (protein) bond on ribose between the glycine and the ribose-amide.

Top: purine intermediate PRA and an aspartate-glycine dipeptide. Middle-Bottom: the reaction PurD carries out and the glycine phosphate intermediate.

Maybe it doesn’t mean anything but get enough interesting things together and there is more room for thought.

The protein that does this step, PurD, is an example of something that I can cut a lot of things off of, just for fun. PurD is a member of the ATPgrasp family of proteins. These proteins bind an ATP, and pop a single phosphate off making an ADP.

When a protein uses ATP as “energy currency” in this way it either participates in a chemical reaction directly or it drives physical movement of a protein to do something else including chemistry. The metaphor is about doing work. Here the phosphate directly participates in attaching glycine to the nitrogen added in step 1 with a glycine-P intermediate.
The ATP-Grasp Enzymes

Structural biology of the purine biosynthetic pathway

Here is where the chopping out can happen. In the vent there is phosphate. The ATPgrasp family is essentially a bunch of phosphate dispensers and just for fun can be deleted in terms of what is more or less interesting here. That being said phosphate control seems to have evolved first/early
Structural Phylogenomics Reveals Gradual Evolutionary Replacement of Abiotic Chemistries by Protein Enzymes in Purine Metabolism

And the ATPgrasp family is related to the protein that puts aspartate on CAIR in the  purine pathway which is interesting.

That’s most of the protein deleted, except for the C2 domain. It’s an interesting little piece and it’s relatives are involved in some very origins related chemistry. At the Evolutionary Classification of Protein Domains database this domain is in the “Alpha helix + beta sheet complex topology” bin. And in there it’s in the “alpha/beta-Hammerhead Barrel-sandwich hybrid” X, H and T (homology, topology) groups.

That whole T group is filled with interesting stuff but the PurD C-terminal fragment is in the “CO dehydrogenase molybdoprotein N-domain-like” F or family group. It’s relatives include not only the above CODH MO-N fragment (1C+1C) but is also in biotin carboxylase (1C manipulation), acetyl-CoA carboxylase (1C+2C, on the drawing), and pyruvate carboxylase (1C+3C).

No where else do I see a numerical progression like that. Where did aspartate come from for pyrimidines? Today cells can use pyruvate carboxylase to make oxaloacetate in initiating gluconeogenesis (glucose synthesis), which is also made into aspartate. Interesting.

This “CODH MO-N fragment” is also in 3 other purine biosynthesis genes, PurT, PurK, and PurP. They involve formate, and bicarbonate. Carboxylations use bicarbonate. The fragment is also in L Amino acid Ligase (LAL). And it is present in a few proteins without the ATPgrasp part: xanthine dehydrogenase chain B, Carbon monoxide dehydrogenase large chain (so  2 parts of CODH), aldehyde oxioreductase, quinoline 2-oxidoreductase large subunit, and 4-hydroxybenzoyl-coa reductase large subunit. I don’t know what to think about the things in this paragraph yet but I mention them for completeness.

The last thought on this fragment is that the general shape of the carboxy HO-C=O, or even the carbonyl C=O itself is maybe what it is about. I need to look at protein structures.

Ribose

I’ll finish with some thoughts on ribose. I’ve been hoping the origin of ribose comes out of this. There are different variables to consider in the abstract. The polynucleotides are the most complex part of this. Only counting genomes ribose:
1) Has 5 carbons with hydroxals on 4 and an aldehyde on one end. Each oriented the same (chirality).
2) Polymerizes using phosphate on the carbon on the end opposite the aldehyde (#5) and 2 carbons over (#3). So a 3-carbon segment for the backbone.
3) Provides a site for nitrogenous base binding at the aldehyde carbon (#1). This also provides a site for base pairing between strands.

Each of those variables is modifiable hypothetically.

Maybe there were things with erythrose-4P or shorter in the backbone of the polymer. This still allows for base pairing. With glyceraldehyde-3P maybe you still get the same but no ring.

But it’s the predictability of the ring that evolution can respond to. There is a huge diversity in things that bind ribose or things with it. Erythrose is only used in a few other places outside of the pentose phosphate pathway and the pathway that makes part of folate, tryptophan, phenylalanine, and tyrosine. And a 2-carbon segment in the backbone might be too close for 2 phosphates, steric hindrance (physically interfering).

If you get larger than ribose then you worsen the problem of self splicing, poly-RNAs of a certain size looping out and removing part of themselves and sealing the break. Removal of the #2 carbon reactive hydroxal that is removed in Deoxy-ribonucleotide genomes is bad enough, 2 would allow for more chemical possibilities than the relative stability of 1.

Getting outside of genomes carbons #2 and #3 bind amino acids in translation with class 1 and 2 aminoacyltransferases. Maybe there’s evidence of an erythrose backbone and only carbon #2 binding amino acids in the past there, but probably not.
Carbon #3 also binds phosphate in coenzyme-A and niacin in its anabolic mode (the NAD/NADP difference).

Carbon #1 binds lots of cofactors, mostly through at least 1 phosphate as the drawing shows. It also binds lots of other things for transport like sugars, membrane components, and more. Carbon #1 also binds all amino acids during the process of aminoacyl transfer in creating transfer RNA loaded with an amino acid, again on one phosphate, aminoacyl-AMP.

These carbons are also used in making signalling versions of nucleotides like cyclic AMP and cyclic GMP. The signalling functions of nucleotides is an area I’ve yet to get into in depth and integrate into this.

Finally the metabolism of ribose on the figure may provide clues to its origin. In the cofactor pathways ribose is often partially dismantled with pieces leaving, such that 4 or fewer carbons from ribose remain in the final molecule. At the least it makes me think about the assembly of ribose and its own abiotic concentration.

3C+2C?
4C+1C?
6C-1C?
3C+3C with loss of OCO?

I think the first and maybe last follows the most likely routes. 3 comes after 2 and if 2 is already around… It’s maybe thiamine related.

Common ancestry between the purine and thiamine pathways is possible and likely given the use of AIR in making HMP. Attaching glyceraldehyde and glycoaldehyde could be a route. Now how would acetate and pyruvate become sugars? Something between oxaloacetate, gluconeogenesis and the pentose phosphate pathway maybe.