It's time for the bacterial evolution crowd to get their own toys

ResearchBlogging.org
Molecular phylogenetics has revolutionized our understanding of biodiversity and evolution like no tool before it. The advent of using the divergence among the gene and protein sequences of different organisms as both a proxy for the biological species concept (at the species level) and a way to compare organisms with no obvious shared morphological or cellular features has resulted in the most sweeping change of how we classify the diversity of life since Linnaeus. Like any paradigm shift, however, the new data available opened up a Pandora's Box of new issues that needed to be grappled with.

One of the major discoveries that came to light was that prokaryotic organisms (often called "bacteria", but actually made up of two very distinct lineages, the Eubacteria and Archaea) are not quite what we thought they were. You see, Darwinism laid the foundation for how evolutionary biologists interpret the world, and while Darwin got a lot of things right, there were many things he had no idea even existed, that have shaped how we understand evolution today (as there will be many things we don't know about today that will shape how scientists in 150 years understand evolution). While Darwin's concepts for descent with modification fit the multicellular world pretty damn well, things get a little muddled in the unicellular world, especially when we leave the eukaryotic cells many of us know, and move to the bacteria.

Bacteria don't play by the rules that work so well for the multicellular among us, and the differences are at the heart of a scientific debate that has been playing out for two decades. In the early stages of molecular phylogenetics, it was thought that the history of life's evolution could be revealed if we compared the sequence of a ubiquitous gene across all taxa. The small subunit (SSU) of the ribosomal RNA was chosen as the appropriate gene because no life form lacks ribosomes, and because of it's important role, the DNA sequence that codes for the functional RNA is highly conserved. As a result, the SSU is the most diversely sequenced gene around and is used today as the gold standard for comparison of bacterial and some eukaryotic species.

But there was a surprise in store when people started to sequence other genes from diverse bacterial lineages. The phylogenetic trees resulting from these new data did not match those of the SSU. In some cases, the differences were dramatic. It had been clear for some time that bacteria could exchange DNA, but the extent and potential evolutionary distance of the exchange took the community by surprise. Indeed, there are some bacterial genomes that appear to be a mixed bag of genes from diverse sources, such that their closest sister taxon can not be identified with certainty (e.g. Zhaxybayeva et al 2009). The evolutionary picture within and among bacterial groups is so confounded by lateral gene transfer (LGT), that the term "Tree of Life" has been abandoned by many, in preference of some variation on "Web of Life".

On top of the issues with LGT, there is the problem that bacteria just don't speciate the way eukaryotes do. The most recent issue of Biology & Philosophy (2010, 25(4)) is dedicated to the discussion of bacterial evolution and how it differs from that of eukaryotes, but the paper by Lawrence and Retchless (2010) really drives to the heart of the problem: we can not use models fashioned after evolutionary patterns in eukaryotes to understand prokaryotic evolution because the speciate in fundamentally different ways.

This message has been repeated far and wide and several researchers are actively proposing novel models for use in prokaryotic systems (e.g. Bapteste et al. 2009). Despite this, paper after paper are churned out using traditional phylogenetic methods to try and classify bacteria using the same assumptions applied to eukaryotic systems. What the hell is going on?

To tell you the truth, I don't know. I think part of the issue is availability and acceptability. Tried and true phylogenetic methods are well known and reasonably well understood by a large community of people. If one is writing a paper or grant proposal, introducing controversial or novel methodology is one way to make the process exceedingly more difficult on yourself (a whole new thing for reviewer 3 to reject outright without understanding it!). Playing it safe means that you can cite the large body of literature that is also applying the same methods, in some sort of schooling fish mentality. Another factor might be the radicalization of the LGT movement by researchers not willing to abandon the Tree of Life idea, for personal or political reasons. Rarely have I seen such vitriol unleashed at conferences as when the topic of rampant prokaryotic LGT comes up.

But the data are the data. It is abundantly clear that bacteria violate the assumptions inherent in the methodology currently used to model evolutionary history (even worse than eukaryotes do, but that's a different story). Until the bacterial evolution community comes up with and embraces new methods to model prokaryotic evolution, leaps in our understanding of that process will be limited.

References
Zhaxybayeva O, Swithers KS, Lapierre P, Fournier GP, Bickhart DM, DeBoy RT, Nelson KE, Nesbø CL, Doolittle WF, Gogarten JP, & Noll KM (2009). On the chimeric nature, thermophilic origin, and phylogenetic placement of the Thermotogales. Proceedings of the National Academy of Sciences of the United States of America, 106 (14), 5865-70 PMID: 19307556

Lawrence, J., & Retchless, A. (2010). The myth of bacterial species and speciation Biology & Philosophy, 25 (4), 569-588 DOI: 10.1007/s10539-010-9215-5

Bapteste, E., O'Malley, M., Beiko, R., Ereshefsky, M., Gogarten, J., Franklin-Hall, L., Lapointe, F., Dupré, J., Dagan, T., Boucher, Y., & Martin, W. (2009). Prokaryotic evolution and the tree of life are two different things Biology Direct, 4 (1) DOI: 10.1186/1745-6150-4-34

6 responses so far

  • becca says:

    For the purposes of functionally classifying organisms, a lot of things may eventually be solved by simply having the capacity to look at every gene for vast numbers of organisms.
    The ribosomal SSU wasn't chosen only because it's a gene everything has in common, but also because it has regions that mutate faster and regions that mutate slower, facilitating comparisons between more similar and more different organisms. That idea was good when we were limited in sequencing capacity, but now it probably makes more sense to do the whole genome.

    Ultimately, I think those studying prokaryotic evolution are just going to have to abandon the idea that 'more similar' organisms = 'more recent genetic exchange". It's a PITA, because we really won't get good calibration of the history (i.e. I think we'll be able to know the sequence of genetic events, but not really how many years apart they occurred; I don't think the rate of mutation was constant, or even varried according to a nice predictable function, during the bulk of the evolution of life).

    I recently went to a great talk from a structural biologist about the ATPases, where he was trying to figure out things based on the question "what did the Last Universal Common Ancestor have?". It worked well for things like coming up with a reasonable rationale for the theory that Na+ pumps had to proceed the H+ pumps (because you had to evolve more complicated, or at least less porous, membranes before you can use H+ with any efficacy). But it doesn't tell us whether ATP pumps, and the membrane gradient they allow, should be considered the essential feature of life, or whether the previous incarnation of the ATP pump (an RNA helicase), and the nucleic acid replication it enables, should be considered the essential feature of life. It's awfully hard to figure out if we want the last common cell, or the last common bit of nucleic code.
    When you get THAT far back in evolution, the need for ANY toy to analyze what little ambiguous data we've got is very pressing.

  • Dr. O says:

    Ultimately, I think those studying prokaryotic evolution are just going to have to abandon the idea that ‘more similar’ organisms = ‘more recent genetic exchange”. It’s a pity, because we really won’t get good calibration of the history (i.e. I think we’ll be able to know the sequence of genetic events, but not really how many years apart they occurred; I don’t think the rate of mutation was constant, or even varied according to a nice predictable function, during the bulk of the evolution of life).

    I think you hit the nail on the head, Becca (and PLS). As a bacterial geneticist/pathogenesis researcher, I find myself so overwhelmed when others try to map out bacterial evolution with a clear-cut linear timeline.

    For those interested, the Salmonella, Shigella, E. coli, pathogenic E. coli story is an especially interesting one with regards to lateral gene transfer and bacterial relatedness, IMO - chocked full of LGT twists and turns!

  • Graham says:

    Interesting post and comments! I'm a math/computer person interested in phylogenetic analysis - in other words, I'm interested in developing new toys (which I spell like this: tools). Here are some fairly random thoughts and questions. If I've said something wrong, please tell me.

    All models are wrong but some are useful. The question is not whether assumptions are being violated (of course they are) but how much the results are getting messed up. (Does anyone actually know that?) It's also relative: like becca said, if you're desperate, any model is better than none.

    I don't have access to the paper by Lawrence and Retchless. How much worse is the problem of lineage seperation for prokaryotes than eukaryotes? How do we know? People have been worrying about reconciling gene trees (mostly in eukaryotes I presume) for 30 years.

    becca said "I don’t think the rate of mutation was constant, or even varried according to a nice predictable function, during the bulk of the evolution of life." I don't think so either, but relaxed-clock models allow for unpredictable changes in mutation rates. The question is, how relaxed? My guess is that most variation is due to changes in generation length. If I recall correctly, generation time of E Coli in gut is 1 hour, in soil, more like 1000 hours, and I guess mutation rates would vary by a similar factor. If you can restrict to organisms in similar environments, maybe its not so bad.

    It is not clear what time scales people are talking about. Kya, Mya, Bya? It says here (Speciation by Coyne and Orr) that 18% of E Coli loci were acquired by HGT over 200My. That doesn't seem like an insuperable problem if you're going back less than 200My. 2Bya looks tricky.

    Use the whole genome, sure, but how? If you put the whole thing into a single analysis, you're imposing a strict tree structure. If you get thousands of gene trees, what are you going to do with them? It might be better to look at events like gene duplication, or changes in gene order. I get the feeling that gene order is an underused approach.

    Or maybe look at 'micro-morphological' features. I'm thinking of things like features (=measurements, characteristics) of 3D protein structure, or features of metabolic pathways, and using those along with sequence data in a phylogenetic analysis.

  • proflikesubstance says:

    If the hope is to produce any meaningful model of how life has evolved from LUCA, then 2+ BYA is what we're talking. My guess, based only on what we know about extant lineages, is that change has been so massive and so constant, that we will not get anything even close to reality if we continue to bang our heads on the wall with traditional methods. You can relax clocks and you can allow for X% HGT, etc., but if the end goal is a tree I think you've already lost because it can't adequately represent reality. Not even as a compromise.

    My frustration is that we know this. We know that method developed for organisms that fuse two haploid cells to form diploid offsrping with almost exclusively vertical inheritance suck at modeling the evolutionary processes that occur in bacteria - namely mostly asexual cell division with unidirectional gene flow of often only a small subset of genes and sometimes across large gaps in evolutionary divergence. It can't account for 50 million year old bacteriophages that get swept into the water column from disturbed deep water sediment and suddenly infect new cells today with the suspended DNA of their ancient relatives. Despite knowing the massive limitations in the methods, people use them anyway rather than creating something that has some chance of getting an answer in the ball park of reality.

    The traditional methods work decently in situations of moderate divergence between eukaryotic lineages that behave and don't have long branch lengths. Why we apply these same methods to prokaryotic systems boggles the mind.

  • Graham says:

    50 million year old bacteriophages ??

    I can imagine bacteria getting stuck at the bottom of the ocean, slowing their metabolism to tiny rates, and re-emerging 50 million years later to swap genes with their long lost cousins, possibly with help of viruses. But DNA being preserved in a virus for 50 million years boggles my mind.

  • becca says:

    "My guess is that most variation is due to changes in generation length. If I recall correctly, generation time of E Coli in gut is 1 hour, in soil, more like 1000 hours, and I guess mutation rates would vary by a similar factor. If you can restrict to organisms in similar environments, maybe its not so bad."
    First, think of the differences in radiation exposure as differences in atmosphere accumulate over the history of life. There are going to be a LOT of ways in which equal-time generations lead to VERY different mutation rates. Probably way more than a mere 1000x.
    Moreover, you've very much spoken like a eukaryotic biologist. In the distant past, generations could have been irrelevant. That is, mother-to-daughter (cell) transmission of DNA, could possibly have been a relatively negligible source of mutations, if there really is a Darwinian threshold. Maybe it was almost ALL horizontal gene transfer. Nevermind the wrinkle of how you define one generation vs. another in a pre-cell environment.

    "Use the whole genome, sure, but how? If you put the whole thing into a single analysis, you’re imposing a strict tree structure. If you get thousands of gene trees, what are you going to do with them? It might be better to look at events like gene duplication, or changes in gene order. I get the feeling that gene order is an underused approach."
    Not equally weighted, of course (not unless you're stamp collecting). Different weights for 'fast' and 'slow' regions, depending on what timescale of evolution you care about. Some genes are going to tend to move as units- it would be useful for hypothesis generation to group based on gene clusters. But the basic answer I have is "different trees for different topics".
    And perhaps not trees, but three dimensional clusters (more dimensions => more accurate but less fathomable model), or perhaps webs if we're talking HGT (something to learn from the signal transduction folks? At least the ones sensible enough to move away from linear pathway models. I am passionately frustrated with the limitations of linear pathways).

    Gene order in prokaryotes gets messy- multiple plasmids can easily have different orders within one cell.
    There's been some good work done with metabolic pathways, I think. Although we have to keep in mind that any method that does neat things for stuff we can cultivate (at least long enough to measure it's chemical inputs and outputs) is likely to be terribly skewed.

Leave a Reply