1,293 results on '"Wolf Yuri I"'
Search Results
2. Microbial diversity and ecological complexity emerging from environmental variation and horizontal gene transfer in a simple mathematical model
- Author
-
Babajanyan, Sanasar G., Garushyants, Sofya K., Wolf, Yuri I., and Koonin, Eugene V.
- Published
- 2024
- Full Text
- View/download PDF
3. Addendum: Cellular differentiation into hyphae and spores in halophilic archaea
- Author
-
Tang, Shu-Kun, Zhi, Xiao-Yang, Zhang, Yao, Makarova, Kira S., Liu, Bing-Bing, Zheng, Guo-Song, Zhang, Zhen-Peng, Zheng, Hua-Jun, Wolf, Yuri I., Zhao, Yu-Rong, Jiang, Song-Hao, Chen, Xi-Ming, Li, En-Yuan, Zhang, Tao, Chen, Pei-Ru, Feng, Yu-Zhou, Xiang, Ming-Xian, Lin, Zhi-Qian, Shi, Jia-Hui, Chang, Cheng, Zhang, Xue, Li, Rui, Lou, Kai, Wang, Yun, Chang, Lei, Yin, Min, Yang, Ling-Ling, Gao, Hui-Ying, Zhang, Zhong-Kai, Tao, Tian-Shen, Guan, Tong-Wei, He, Fu-Chu, Lu, Yin-Hua, Cui, Heng-Lin, Koonin, Eugene V., Zhao, Guo-Ping, and Xu, Ping
- Published
- 2024
- Full Text
- View/download PDF
4. Evolution of optimal growth temperature in Asgard archaea inferred from the temperature dependence of GDP binding to EF-1A
- Author
-
Lu, Zhongyi, Xia, Runyue, Zhang, Siyu, Pan, Jie, Liu, Yang, Wolf, Yuri I., Koonin, Eugene V., and Li, Meng
- Published
- 2024
- Full Text
- View/download PDF
5. Mining metatranscriptomes reveals a vast world of viroid-like circular RNAs
- Author
-
Lee, Benjamin D, Neri, Uri, Roux, Simon, Wolf, Yuri I, Camargo, Antonio Pedro, Krupovic, Mart, Consortium, RNA Virus Discovery, Simmonds, Peter, Kyrpides, Nikos, Gophna, Uri, Dolja, Valerian V, and Koonin, Eugene V
- Subjects
Biological Sciences ,RNA ,Circular ,Viroids ,RNA ,Catalytic ,RNA ,Viral ,Ecosystem ,Plant Diseases ,RNA Virus Discovery Consortium ,Medical and Health Sciences ,Developmental Biology ,Biological sciences ,Biomedical and clinical sciences - Abstract
Viroids and viroid-like covalently closed circular (ccc) RNAs are minimal replicators that typically encode no proteins and hijack cellular enzymes for replication. The extent and diversity of viroid-like agents are poorly understood. We developed a computational pipeline to identify viroid-like cccRNAs and applied it to 5,131 metatranscriptomes and 1,344 plant transcriptomes. The search yielded 11,378 viroid-like cccRNAs spanning 4,409 species-level clusters, a 5-fold increase compared to the previously identified viroid-like elements. Within this diverse collection, we discovered numerous putative viroids, satellite RNAs, retrozymes, and ribozy-like viruses. Diverse ribozyme combinations and unusual ribozymes within the cccRNAs were identified. Self-cleaving ribozymes were identified in ambiviruses, some mito-like viruses and capsid-encoding satellite virus-like cccRNAs. The broad presence of viroid-like cccRNAs in diverse transcriptomes and ecosystems implies that their host range is far broader than currently known, and matches to CRISPR spacers suggest that some cccRNAs replicate in prokaryotes.
- Published
- 2023
6. Expansion of the global RNA virome reveals diverse clades of bacteriophages
- Author
-
Neri, Uri, Wolf, Yuri I, Roux, Simon, Camargo, Antonio Pedro, Lee, Benjamin, Kazlauskas, Darius, Chen, I Min, Ivanova, Natalia, Allen, Lisa Zeigler, Paez-Espino, David, Bryant, Donald A, Bhaya, Devaki, Consortium, RNA Virus Discovery, Narrowe, Adrienne B, Probst, Alexander J, Sczyrba, Alexander, Kohler, Annegret, Séguin, Armand, Shade, Ashley, Campbell, Barbara J, Lindahl, Björn D, Reese, Brandi Kiel, Roque, Breanna M, DeRito, Chris, Averill, Colin, Cullen, Daniel, Beck, David AC, Walsh, David A, Ward, David M, Wu, Dongying, Eloe-Fadrosh, Emiley, Brodie, Eoin L, Young, Erica B, Lilleskov, Erik A, Castillo, Federico J, Martin, Francis M, LeCleir, Gary R, Attwood, Graeme T, Cadillo-Quiroz, Hinsby, Simon, Holly M, Hewson, Ian, Grigoriev, Igor V, Tiedje, James M, Jansson, Janet K, Lee, Janey, VanderGheynst, Jean S, Dangl, Jeff, Bowman, Jeff S, Blanchard, Jeffrey L, Bowen, Jennifer L, Xu, Jiangbing, Banfield, Jillian F, Deming, Jody W, Kostka, Joel E, Gladden, John M, Rapp, Josephine Z, Sharpe, Joshua, McMahon, Katherine D, Treseder, Kathleen K, Bidle, Kay D, Wrighton, Kelly C, Thamatrakoln, Kimberlee, Nusslein, Klaus, Meredith, Laura K, Ramirez, Lucia, Buee, Marc, Huntemann, Marcel, Kalyuzhnaya, Marina G, Waldrop, Mark P, Sullivan, Matthew B, Schrenk, Matthew O, Hess, Matthias, Vega, Michael A, O’Malley, Michelle A, Medina, Monica, Gilbert, Naomi E, Delherbe, Nathalie, Mason, Olivia U, Dijkstra, Paul, Chuckran, Peter F, Baldrian, Petr, Constant, Philippe, Stepanauskas, Ramunas, Daly, Rebecca A, Lamendella, Regina, Gruninger, Robert J, McKay, Robert M, Hylander, Samuel, Lebeis, Sarah L, Esser, Sarah P, Acinas, Silvia G, Wilhelm, Steven S, Singer, Steven W, Tringe, Susannah S, Woyke, Tanja, Reddy, TBK, Bell, Terrence H, Mock, Thomas, McAllister, Tim, and Thiel, Vera
- Subjects
Microbiology ,Biological Sciences ,Bioinformatics and Computational Biology ,Infectious Diseases ,Genetics ,Microbiome ,Biotechnology ,Infection ,Bacteriophages ,DNA-Directed RNA Polymerases ,Genome ,Viral ,Phylogeny ,RNA ,RNA Viruses ,RNA-Dependent RNA Polymerase ,Virome ,RNA Virus Discovery Consortium ,Bactriophage ,Functional protein annotation ,Metatranscriptomics ,RNA Virus ,RNA dependent RNA polymerase ,Viral Ecology ,Virus ,Virus - Host prediction ,viral phylogeny ,Medical and Health Sciences ,Developmental Biology ,Biological sciences ,Biomedical and clinical sciences - Abstract
High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.
- Published
- 2022
7. Thermodynamics of Evolution and the Origin of Life
- Author
-
Vanchurin, Vitaly, Wolf, Yuri I., Koonin, Eugene V., and Katsnelson, Mikhail I.
- Subjects
Quantitative Biology - Populations and Evolution ,Condensed Matter - Disordered Systems and Neural Networks ,Condensed Matter - Statistical Mechanics ,Computer Science - Machine Learning - Abstract
We outline a phenomenological theory of evolution and origin of life by combining the formalism of classical thermodynamics with a statistical description of learning. The maximum entropy principle constrained by the requirement for minimization of the loss function is employed to derive a canonical ensemble of organisms (population), the corresponding partition function (macroscopic counterpart of fitness) and free energy (macroscopic counterpart of additive fitness). We further define the biological counterparts of temperature (biological temperature) as the measure of stochasticity of the evolutionary process and of chemical potential (evolutionary potential) as the amount of evolutionary work required to add a new trainable variable (such as an additional gene) to the evolving system. We then develop a phenomenological approach to the description of evolution, which involves modeling the grand potential as a function of the biological temperature and evolutionary potential. We demonstrate how this phenomenological approach can be used to study the "ideal mutation" model of evolution and its generalizations. Finally, we show that, within this thermodynamics framework, major transitions in evolution, such as the transition from an ensemble of molecules to an ensemble of organisms, that is, the origin of life, can be modeled as a special case of bona fide physical phase transitions that are associated with the emergence of a new type of grand canonical ensemble and the corresponding new level of description, Comment: 23 pages
- Published
- 2021
- Full Text
- View/download PDF
8. Towards a Theory of Evolution as Multilevel Learning
- Author
-
Vanchurin, Vitaly, Wolf, Yuri I., Katsnelson, Mikhail I., and Koonin, Eugene V.
- Subjects
Quantitative Biology - Populations and Evolution ,Condensed Matter - Disordered Systems and Neural Networks ,Computer Science - Machine Learning - Abstract
We apply the theory of learning to physically renormalizable systems in an attempt to develop a theory of biological evolution, including the origin of life, as multilevel learning. We formulate seven fundamental principles of evolution that appear to be necessary and sufficient to render a universe observable and show that they entail the major features of biological evolution, including replication and natural selection. These principles also follow naturally from the theory of learning. We formulate the theory of evolution using the mathematical framework of neural networks, which provides for detailed analysis of evolutionary phenomena. To demonstrate the potential of the proposed theoretical framework, we derive a generalized version of the Central Dogma of molecular biology by analyzing the flow of information during learning (back-propagation) and predicting (forward-propagation) the environment by evolving organisms. The more complex evolutionary phenomena, such as major transitions in evolution, in particular, the origin of life, have to be analyzed in the thermodynamic limit, which is described in detail in the accompanying paper., Comment: 29 pages, 3 figures
- Published
- 2021
- Full Text
- View/download PDF
9. Cellular differentiation into hyphae and spores in halophilic archaea
- Author
-
Tang, Shu-Kun, Zhi, Xiao-Yang, Zhang, Yao, Makarova, Kira S., Liu, Bing-Bing, Zheng, Guo-Song, Zhang, Zhen-Peng, Zheng, Hua-Jun, Wolf, Yuri I., Zhao, Yu-Rong, Jiang, Song-Hao, Chen, Xi-Ming, Li, En-Yuan, Zhang, Tao, Chen, Pei-Ru, Feng, Yu-Zhou, Xiang, Ming-Xian, Lin, Zhi-Qian, Shi, Jia-Hui, Chang, Cheng, Zhang, Xue, Li, Rui, Lou, Kai, Wang, Yun, Chang, Lei, Yin, Min, Yang, Ling-Ling, Gao, Hui-Ying, Zhang, Zhong-Kai, Tao, Tian-Shen, Guan, Tong-Wei, He, Fu-Chu, Lu, Yin-Hua, Cui, Heng-Lin, Koonin, Eugene V., Zhao, Guo-Ping, and Xu, Ping
- Published
- 2023
- Full Text
- View/download PDF
10. Punctuated equilibrium as the default mode of evolution of large populations on fitness landscapes dominated by saddle points in the weak-mutation limit
- Author
-
Bakhtin, Yuri, Katsnelson, Mikhail I., Wolf, Yuri I., and Koonin, Eugene V.
- Subjects
Quantitative Biology - Populations and Evolution ,Condensed Matter - Statistical Mechanics ,Nonlinear Sciences - Adaptation and Self-Organizing Systems - Abstract
Punctuated equilibrium is a mode of evolution in which phenetic change occurs in rapid bursts that are separated by much longer intervals of stasis during which mutations accumulate but no major phenotypic change occurs. Punctuated equilibrium has been originally proposed within the framework of paleobiology, to explain the lack of transitional forms that is typical of the fossil record. Theoretically, punctuated equilibrium has been linked to self-organized criticality (SOC), a model in which the size of avalanches in an evolving system is power-law distributed, resulting in increasing rarity of major events. We show here that, under the weak-mutation limit, a large population would spend most of the time in stasis in the vicinity of saddle points in the fitness landscape. The periods of stasis are punctuated by fast transitions, in lnNe time (Ne, effective population size), when a new beneficial mutation is fixed in the evolving population, which moves to a different saddle, or on much rarer occasions, from a saddle to a local peak. Thus, punctuated equilibrium is the default mode of evolution under a simple model that does not involve SOC or other special conditions., Comment: 25 pages, 2 figures
- Published
- 2020
11. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer
- Author
-
Wolf Yuri I, Makarova Kira S, Yutin Natalya, and Koonin Eugene V
- Subjects
Archaea ,Orthologs ,Horizontal gene transfer ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea. Results The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major ‘highways’ of horizontal gene transfer. Conclusions The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time. Reviewers This article was reviewed by (for complete reviews see the Reviewers’ Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
- Published
- 2012
- Full Text
- View/download PDF
12. Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems
- Author
-
Wolf Yuri I, Aravind L, Makarova Kira S, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The CRISPR-Cas adaptive immunity systems that are present in most Archaea and many Bacteria function by incorporating fragments of alien genomes into specific genomic loci, transcribing the inserts and using the transcripts as guide RNAs to destroy the genome of the cognate virus or plasmid. This RNA interference-like immune response is mediated by numerous, diverse and rapidly evolving Cas (CRISPR-associated) proteins, several of which form the Cascade complex involved in the processing of CRISPR transcripts and cleavage of the target DNA. Comparative analysis of the Cas protein sequences and structures led to the classification of the CRISPR-Cas systems into three Types (I, II and III). Results A detailed comparison of the available sequences and structures of Cas proteins revealed several unnoticed homologous relationships. The Repeat-Associated Mysterious Proteins (RAMPs) containing a distinct form of the RNA Recognition Motif (RRM) domain, which are major components of the CRISPR-Cas systems, were classified into three large groups, Cas5, Cas6 and Cas7. Each of these groups includes many previously uncharacterized proteins now shown to adopt the RAMP structure. Evidence is presented that large subunits contained in most of the CRISPR-Cas systems could be homologous to Cas10 proteins which contain a polymerase-like Palm domain and are predicted to be enzymatically active in Type III CRISPR-Cas systems but inactivated in Type I systems. These findings, the fact that the CRISPR polymerases, RAMPs and Cas2 all contain core RRM domains, and distinct gene arrangements in the three types of CRISPR-Cas systems together provide for a simple scenario for origin and evolution of the CRISPR-Cas machinery. Under this scenario, the CRISPR-Cas system originated in thermophilic Archaea and subsequently spread horizontally among prokaryotes. Conclusions Because of the extreme diversity of CRISPR-Cas systems, in-depth sequence and structure comparison continue to reveal unexpected homologous relationship among Cas proteins. Unification of Cas protein families previously considered unrelated provides for improvement in the classification of CRISPR-Cas systems and a reconstruction of their evolution. Open peer review This article was reviewed by Malcolm White (nominated by Purficacion Lopez-Garcia), Frank Eisenhaber and Igor Zhulin. For the full reviews, see the Reviewers' Comments section.
- Published
- 2011
- Full Text
- View/download PDF
13. The common ancestry of life
- Author
-
Wolf Yuri I and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background It is common belief that all cellular life forms on earth have a common origin. This view is supported by the universality of the genetic code and the universal conservation of multiple genes, particularly those that encode key components of the translation system. A remarkable recent study claims to provide a formal, homology independent test of the Universal Common Ancestry hypothesis by comparing the ability of a common-ancestry model and a multiple-ancestry model to predict sequences of universally conserved proteins. Results We devised a computational experiment on a concatenated alignment of universally conserved proteins which shows that the purported demonstration of the universal common ancestry is a trivial consequence of significant sequence similarity between the analyzed proteins. The nature and origin of this similarity are irrelevant for the prediction of "common ancestry" of by the model-comparison approach. Thus, homology (common origin) of the compared proteins remains an inference from sequence similarity rather than an independent property demonstrated by the likelihood analysis. Conclusion A formal demonstration of the Universal Common Ancestry hypothesis has not been achieved and is unlikely to be feasible in principle. Nevertheless, the evidence in support of this hypothesis provided by comparative genomics is overwhelming. Reviewers this article was reviewed by William Martin, Ivan Iossifov (nominated by Andrey Rzhetsky) and Arcady Mushegian. For the complete reviews, see the Reviewers' Report section.
- Published
- 2010
- Full Text
- View/download PDF
14. Non-homologous isofunctional enzymes: A systematic analysis of alternative solutions in enzyme evolution
- Author
-
Wolf Yuri I, Galperin Michael Y, Omelchenko Marina V, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins. Results We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress. Conclusions These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity. Reviewers This article was reviewed by Andrei Osterman, Keith F. Tipton (nominated by Martijn Huynen) and Igor B. Zhulin. For the full reviews, go to the Reviewers' comments section.
- Published
- 2010
- Full Text
- View/download PDF
15. Eukaryotic large nucleo-cytoplasmic DNA viruses: Clusters of orthologous genes and reconstruction of viral genome evolution
- Author
-
Koonin Eugene V, Raoult Didier, Wolf Yuri I, and Yutin Natalya
- Subjects
Infectious and parasitic diseases ,RC109-216 - Abstract
Abstract Background The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes. Results A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions. Conclusions The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.
- Published
- 2009
- Full Text
- View/download PDF
16. Is evolution Darwinian or/and Lamarckian?
- Author
-
Wolf Yuri I and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The year 2009 is the 200th anniversary of the publication of Jean-Bapteste Lamarck's Philosophie Zoologique and the 150th anniversary of Charles Darwin's On the Origin of Species. Lamarck believed that evolution is driven primarily by non-randomly acquired, beneficial phenotypic changes, in particular, those directly affected by the use of organs, which Lamarck believed to be inheritable. In contrast, Darwin assigned a greater importance to random, undirected change that provided material for natural selection. The concept The classic Lamarckian scheme appears untenable owing to the non-existence of mechanisms for direct reverse engineering of adaptive phenotypic characters acquired by an individual during its life span into the genome. However, various evolutionary phenomena that came to fore in the last few years, seem to fit a more broadly interpreted (quasi)Lamarckian paradigm. The prokaryotic CRISPR-Cas system of defense against mobile elements seems to function via a bona fide Lamarckian mechanism, namely, by integrating small segments of viral or plasmid DNA into specific loci in the host prokaryote genome and then utilizing the respective transcripts to destroy the cognate mobile element DNA (or RNA). A similar principle seems to be employed in the piRNA branch of RNA interference which is involved in defense against transposable elements in the animal germ line. Horizontal gene transfer (HGT), a dominant evolutionary process, at least, in prokaryotes, appears to be a form of (quasi)Lamarckian inheritance. The rate of HGT and the nature of acquired genes depend on the environment of the recipient organism and, in some cases, the transferred genes confer a selective advantage for growth in that environment, meeting the Lamarckian criteria. Various forms of stress-induced mutagenesis are tightly regulated and comprise a universal adaptive response to environmental stress in cellular life forms. Stress-induced mutagenesis can be construed as a quasi-Lamarckian phenomenon because the induced genomic changes, although random, are triggered by environmental factors and are beneficial to the organism. Conclusion Both Darwinian and Lamarckian modalities of evolution appear to be important, and reflect different aspects of the interaction between populations and the environment. Reviewers this article was reviewed by Juergen Brosius, Valerian Dolja, and Martijn Huynen. For complete reports, see the Reviewers' reports section.
- Published
- 2009
- Full Text
- View/download PDF
17. The fundamental units, processes and patterns of evolution, and the Tree of Life conundrum
- Author
-
Wolf Yuri I and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The elucidation of the dominant role of horizontal gene transfer (HGT) in the evolution of prokaryotes led to a severe crisis of the Tree of Life (TOL) concept and intense debates on this subject. Concept Prompted by the crisis of the TOL, we attempt to define the primary units and the fundamental patterns and processes of evolution. We posit that replication of the genetic material is the singular fundamental biological process and that replication with an error rate below a certain threshold both enables and necessitates evolution by drift and selection. Starting from this proposition, we outline a general concept of evolution that consists of three major precepts. 1. The primary agency of evolution consists of Fundamental Units of Evolution (FUEs), that is, units of genetic material that possess a substantial degree of evolutionary independence. The FUEs include both bona fide selfish elements such as viruses, viroids, transposons, and plasmids, which encode some of the information required for their own replication, and regular genes that possess quasi-independence owing to their distinct selective value that provides for their transfer between ensembles of FUEs (genomes) and preferential replication along with the rest of the recipient genome. 2. The history of replication of a genetic element without recombination is isomorphously represented by a directed tree graph (an arborescence, in the graph theory language). Recombination within a FUE is common between very closely related sequences where homologous recombination is feasible but becomes negligible for longer evolutionary distances. In contrast, shuffling of FUEs occurs at all evolutionary distances. Thus, a tree is a natural representation of the evolution of an individual FUE on the macro scale, but not of an ensemble of FUEs such as a genome. 3. The history of life is properly represented by the "forest" of evolutionary trees for individual FUEs (Forest of Life, or FOL). Search for trends and patterns in the FOL is a productive direction of study that leads to the delineation of ensembles of FUEs that evolve coherently for a certain time span owing to a shared history of vertical inheritance or horizontal gene transfer; these ensembles are commonly known as genomes, taxa, or clades, depending on the level of analysis. A small set of genes (the universal genetic core of life) might show a (mostly) coherent evolutionary trend that transcends the entire history of cellular life forms. However, it might not be useful to denote this trend "the tree of life", or organismal, or species tree because neither organisms nor species are fundamental units of life. Conclusion A logical analysis of the units and processes of biological evolution suggests that the natural fundamental unit of evolution is a FUE, that is, a genetic element with an independent evolutionary history. Evolution of a FUE on the macro scale is naturally represented by a tree. Only the full compendium of trees for individual FUEs (the FOL) is an adequate depiction of the evolution of life. Coherent evolution of FUEs over extended evolutionary intervals is a crucial aspect of the history of life but a "species" or "organismal" tree is not a fundamental concept. Reviewers This articles was reviewed by Valerian Dolja, W. Ford Doolittle, Nicholas Galtier, and William Martin
- Published
- 2009
- Full Text
- View/download PDF
18. Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements
- Author
-
van der Oost John, Wolf Yuri I, Makarova Kira S, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the target. Key components of the RNAi systems are proteins of the Argonaute-PIWI family some of which function as slicers, the nucleases that cleave the target RNA that is base-paired to a guide RNA. Numerous prokaryotes possess the CRISPR-associated system (CASS) of defense against phages and plasmids that is, in part, mechanistically analogous but not homologous to eukaryotic RNAi systems. Many prokaryotes also encode homologs of Argonaute-PIWI proteins but their functions remain unknown. Results We present a detailed analysis of Argonaute-PIWI protein sequences and the genomic neighborhoods of the respective genes in prokaryotes. Whereas eukaryotic Ago/PIWI proteins always contain PAZ (oligonucleotide binding) and PIWI (active or inactivated nuclease) domains, the prokaryotic Argonaute homologs (pAgos) fall into two major groups in which the PAZ domain is either present or absent. The monophyly of each group is supported by a phylogenetic analysis of the conserved PIWI-domains. Almost all pAgos that lack a PAZ domain appear to be inactivated, and the respective genes are associated with a variety of predicted nucleases in putative operons. An additional, uncharacterized domain that is fused to various nucleases appears to be a unique signature of operons encoding the short (lacking PAZ) pAgo form. By contrast, almost all PAZ-domain containing pAgos are predicted to be active nucleases. Some proteins of this group (e.g., that from Aquifex aeolicus) have been experimentally shown to possess nuclease activity, and are not typically associated with genes for other (putative) nucleases. Given these observations, the apparent extensive horizontal transfer of pAgo genes, and their common, statistically significant over-representation in genomic neighborhoods enriched in genes encoding proteins involved in the defense against phages and/or plasmids, we hypothesize that pAgos are key components of a novel class of defense systems. The PAZ-domain containing pAgos are predicted to directly destroy virus or plasmid nucleic acids via their nuclease activity, whereas the apparently inactivated, PAZ-lacking pAgos could be structural subunits of protein complexes that contain, as active moieties, the putative nucleases that we predict to be co-expressed with these pAgos. All these nucleases are predicted to be DNA endonucleases, so it seems most probable that the putative novel phage/plasmid-defense system targets phage DNA rather than mRNAs. Given that in eukaryotic RNAi systems, the PAZ domain binds a guide RNA and positions it on the complementary region of the target, we further speculate that pAgos function on a similar principle (the guide being either DNA or RNA), and that the uncharacterized domain found in putative operons with the short forms of pAgos is a functional substitute for the PAZ domain. Conclusion The hypothesis that pAgos are key components of a novel prokaryotic immune system that employs guide RNA or DNA molecules to degrade nucleic acids of invading mobile elements implies a functional analogy with the prokaryotic CASS and a direct evolutionary connection with eukaryotic RNAi. The predictions of the hypothesis including both the activities of pAgos and those of the associated endonucleases are readily amenable to experimental tests. Reviewers This article was reviewed by Daniel Haft, Martijn Huynen, and Chris Ponting.
- Published
- 2009
- Full Text
- View/download PDF
19. Search for a 'Tree of Life' in the thicket of the phylogenetic forest
- Author
-
Puigbò Pere, Wolf Yuri I, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background Comparative genomics has revealed extensive horizontal gene transfer among prokaryotes, a development that is often considered to undermine the 'tree of life' concept. However, the possibility remains that a statistical central trend still exists in the phylogenetic 'forest of life'. Results A comprehensive comparative analysis of a 'forest' of 6,901 phylogenetic trees for prokaryotic genes revealed a consistent phylogenetic signal, particularly among 102 nearly universal trees, despite high levels of topological inconsistency, probably due to horizontal gene transfer. Horizontal transfers seemed to be distributed randomly and did not obscure the central trend. The nearly universal trees were topologically similar to numerous other trees. Thus, the nearly universal trees might reflect a significant central tendency, although they cannot represent the forest completely. However, topological consistency was seen mostly at shallow tree depths and abruptly dropped at the level of the radiation of archaeal and bacterial phyla, suggesting that early phases of evolution could be non-tree-like (Biological Big Bang). Simulations of evolution under compressed cladogenesis or Biological Big Bang yielded a better fit to the observed dependence between tree inconsistency and phylogenetic depth for the compressed cladogenesis model. Conclusions Horizontal gene transfer is pervasive among prokaryotes: very few gene trees are fully consistent, making the original tree of life concept obsolete. A central trend that most probably represents vertical inheritance is discernible throughout the evolution of archaea and bacteria, although compressed cladogenesis complicates unambiguous resolution of the relationships between the major archaeal and bacterial clades.
- Published
- 2009
- Full Text
- View/download PDF
20. Comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes
- Author
-
Makarova Kira S, Wolf Yuri I, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The prokaryotic toxin-antitoxin systems (TAS, also referred to as TA loci) are widespread, mobile two-gene modules that can be viewed as selfish genetic elements because they evolved mechanisms to become addictive for replicons and cells in which they reside, but also possess "normal" cellular functions in various forms of stress response and management of prokaryotic population. Several distinct TAS of type 1, where the toxin is a protein and the antitoxin is an antisense RNA, and numerous, unrelated TAS of type 2, in which both the toxin and the antitoxin are proteins, have been experimentally characterized, and it is suspected that many more remain to be identified. Results We report a comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems in prokaryotes. Using sensitive methods for distant sequence similarity search, genome context analysis and a new approach for the identification of mobile two-component systems, we identified numerous, previously unnoticed protein families that are homologous to toxins and antitoxins of known type 2 TAS. In addition, we predict 12 new families of toxins and 13 families of antitoxins, and also, predict a TAS or TAS-like activity for several gene modules that were not previously suspected to function in that capacity. In particular, we present indications that the two-gene module that encodes a minimal nucleotidyl transferase and the accompanying HEPN protein, and is extremely abundant in many archaea and bacteria, especially, thermophiles might comprise a novel TAS. We present a survey of previously known and newly predicted TAS in 750 complete genomes of archaea and bacteria, quantitatively demonstrate the exceptional mobility of the TAS, and explore the network of toxin-antitoxin pairings that combines plasticity with selectivity. Conclusion The defining properties of the TAS, namely, the typically small size of the toxin and antitoxin genes, fast evolution, and extensive horizontal mobility, make the task of comprehensive identification of these systems particularly challenging. However, these same properties can be exploited to develop context-based computational approaches which, combined with exhaustive analysis of subtle sequence similarities were employed in this work to substantially expand the current collection of TAS by predicting both previously unnoticed, derived versions of known toxins and antitoxins, and putative novel TAS-like systems. In a broader context, the TAS belong to the resistome domain of the prokaryotic mobilome which includes partially selfish, addictive gene cassettes involved in various aspects of stress response and organized under the same general principles as the TAS. The "selfish altruism", or "responsible selfishness", of TAS-like systems appears to be a defining feature of the resistome and an important characteristic of the entire prokaryotic pan-genome given that in the prokaryotic world the mobilome and the "stable" chromosomes form a dynamic continuum. Reviewers This paper was reviewed by Kenn Gerdes (nominated by Arcady Mushegian), Daniel Haft, Arcady Mushegian, and Andrei Osterman. For full reviews, go to the Reviewers' Reports section.
- Published
- 2009
- Full Text
- View/download PDF
21. The origins of phagocytosis and eukaryogenesis
- Author
-
Wolf Yuri I, Wolf Maxim Y, Yutin Natalya, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background Phagocytosis, that is, engulfment of large particles by eukaryotic cells, is found in diverse organisms and is often thought to be central to the very origin of the eukaryotic cell, in particular, for the acquisition of bacterial endosymbionts including the ancestor of the mitochondrion. Results Comparisons of the sets of proteins implicated in phagocytosis in different eukaryotes reveal extreme diversity, with very few highly conserved components that typically do not possess readily identifiable prokaryotic homologs. Nevertheless, phylogenetic analysis of those proteins for which such homologs do exist yields clues to the possible origin of phagocytosis. The central finding is that a subset of archaea encode actins that are not only monophyletic with eukaryotic actins but also share unique structural features with actin-related proteins (Arp) 2 and 3. All phagocytic processes are strictly dependent on remodeling of the actin cytoskeleton and the formation of branched filaments for which Arp2/3 are responsible. The presence of common structural features in Arp2/3 and the archaeal actins suggests that the common ancestors of the archaeal and eukaryotic actins were capable of forming branched filaments, like modern Arp2/3. The Rho family GTPases that are ubiquitous regulators of phagocytosis in eukaryotes appear to be of bacterial origin, so assuming that the host of the mitochondrial endosymbiont was an archaeon, the genes for these GTPases come via horizontal gene transfer from the endosymbiont or in an earlier event. Conclusion The present findings suggest a hypothetical scenario of eukaryogenesis under which the archaeal ancestor of eukaryotes had no cell wall (like modern Thermoplasma) but had an actin-based cytoskeleton including branched actin filaments that allowed this organism to produce actin-supported membrane protrusions. These protrusions would facilitate accidental, occasional engulfment of bacteria, one of which eventually became the mitochondrion. The acquisition of the endosymbiont triggered eukaryogenesis, in particular, the emergence of the endomembrane system that eventually led to the evolution of modern-type phagocytosis, independently in several eukaryotic lineages. Reviewers This article was reviewed by Simonetta Gribaldo, Gaspar Jekely, and Pierre Pontarotti. For the full reviews, please go to the Reviewers' Reports section.
- Published
- 2009
- Full Text
- View/download PDF
22. Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution
- Author
-
Koonin Eugene V, Wolf Yuri I, and Wolf Maxim Y
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate. Results This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude. Conclusion Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution. Reviewers This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section.
- Published
- 2008
- Full Text
- View/download PDF
23. Complete genome sequence of the extremely acidophilic methanotroph isolate V4, Methylacidiphilum infernorum, a representative of the bacterial phylum Verrucomicrobia
- Author
-
Stott Matthew B, Koonin Eugene V, Yutin Natalya, Wolf Yuri I, Omelchenko Marina V, Galperin Michael Y, Wang Jianmei, Ren Yan, Zhou Zhemin, Ly Benjamin V, Senin Pavel, Saw Jimmy HW, Makarova Kira S, Hou Shaobin, Mountain Bruce W, Crowe Michelle A, Smirnova Angela V, Dunfield Peter F, Feng Lu, Wang Lei, and Alam Maqsudul
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The phylum Verrucomicrobia is a widespread but poorly characterized bacterial clade. Although cultivation-independent approaches detect representatives of this phylum in a wide range of environments, including soils, seawater, hot springs and human gastrointestinal tract, only few have been isolated in pure culture. We have recently reported cultivation and initial characterization of an extremely acidophilic methanotrophic member of the Verrucomicrobia, strain V4, isolated from the Hell's Gate geothermal area in New Zealand. Similar organisms were independently isolated from geothermal systems in Italy and Russia. Results We report the complete genome sequence of strain V4, the first one from a representative of the Verrucomicrobia. Isolate V4, initially named "Methylokorus infernorum" (and recently renamed Methylacidiphilum infernorum) is an autotrophic bacterium with a streamlined genome of ~2.3 Mbp that encodes simple signal transduction pathways and has a limited potential for regulation of gene expression. Central metabolism of M. infernorum was reconstructed almost completely and revealed highly interconnected pathways of autotrophic central metabolism and modifications of C1-utilization pathways compared to other known methylotrophs. The M. infernorum genome does not encode tubulin, which was previously discovered in bacteria of the genus Prosthecobacter, or close homologs of any other signature eukaryotic proteins. Phylogenetic analysis of ribosomal proteins and RNA polymerase subunits unequivocally supports grouping Planctomycetes, Verrucomicrobia and Chlamydiae into a single clade, the PVC superphylum, despite dramatically different gene content in members of these three groups. Comparative-genomic analysis suggests that evolution of the M. infernorum lineage involved extensive horizontal gene exchange with a variety of bacteria. The genome of M. infernorum shows apparent adaptations for existence under extremely acidic conditions including a major upward shift in the isoelectric points of proteins. Conclusion The results of genome analysis of M. infernorum support the monophyly of the PVC superphylum. M. infernorum possesses a streamlined genome but seems to have acquired numerous genes including those for enzymes of methylotrophic pathways via horizontal gene transfer, in particular, from Proteobacteria. Reviewers This article was reviewed by John A. Fuerst, Ludmila Chistoserdova, and Radhey S. Gupta.
- Published
- 2008
- Full Text
- View/download PDF
24. Evolutionary primacy of sodium bioenergetics
- Author
-
Wolf Yuri I, Makarova Kira S, Galperin Michael Y, Mulkidjanian Armen Y, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The F- and V-type ATPases are rotary molecular machines that couple translocation of protons or sodium ions across the membrane to the synthesis or hydrolysis of ATP. Both the F-type (found in most bacteria and eukaryotic mitochondria and chloroplasts) and V-type (found in archaea, some bacteria, and eukaryotic vacuoles) ATPases can translocate either protons or sodium ions. The prevalent proton-dependent ATPases are generally viewed as the primary form of the enzyme whereas the sodium-translocating ATPases of some prokaryotes are usually construed as an exotic adaptation to survival in extreme environments. Results We combine structural and phylogenetic analyses to clarify the evolutionary relation between the proton- and sodium-translocating ATPases. A comparison of the structures of the membrane-embedded oligomeric proteolipid rings of sodium-dependent F- and V-ATPases reveals nearly identical sets of amino acids involved in sodium binding. We show that the sodium-dependent ATPases are scattered among proton-dependent ATPases in both the F- and the V-branches of the phylogenetic tree. Conclusion Barring convergent emergence of the same set of ligands in several lineages, these findings indicate that the use of sodium gradient for ATP synthesis is the ancestral modality of membrane bioenergetics. Thus, a primitive, sodium-impermeable but proton-permeable cell membrane that harboured a set of sodium-transporting enzymes appears to have been the evolutionary predecessor of the more structurally demanding proton-tight membranes. The use of proton as the coupling ion appears to be a later innovation that emerged on several independent occasions. Reviewers This article was reviewed by J. Peter Gogarten, Martijn A. Huynen, and Igor B. Zhulin. For the full reviews, please go to the Reviewers' comments section.
- Published
- 2008
- Full Text
- View/download PDF
25. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
- Author
-
Wolf Yuri I, Novichkov Pavel S, Sorokin Alexander V, Makarova Kira S, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. Results New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems. Conclusion The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: ftp://ftp.ncbi.nih.gov/pub/koonin/arCOGs/. Reviewers This article was reviewed by Peer Bork, Patrick Forterre, and Purificacion Lopez-Garcia.
- Published
- 2007
- Full Text
- View/download PDF
26. Patterns of intron gain and conservation in eukaryotic genes
- Author
-
Wolf Yuri I, Rogozin Igor B, Carmel Liran, and Koonin Eugene V
- Subjects
Evolution ,QH359-425 - Abstract
Abstract Background: The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions. Results: We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs. Conclusion: We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.
- Published
- 2007
- Full Text
- View/download PDF
27. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape
- Author
-
Koonin Eugene V, Wolf Yuri I, and Novozhilov Artem S
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The standard genetic code table has a distinctly non-random structure, with similar amino acids often encoded by codons series that differ by a single nucleotide substitution, typically, in the third or the first position of the codon. It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect. Indeed, it has been shown in several studies that the standard code is more robust than a substantial majority of random codes. However, it remains unclear how much evolution the standard code underwent, what is the level of optimization, and what is the likely starting point. Results We explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes. Only those codes were analyzed for robustness to translation error that possess the same block structure and the same degree of degeneracy as the standard code. This choice of a small part of the vast space of possible codes is based on the notion that the block structure of the standard code is a consequence of the structure of the complex between the cognate tRNA and the codon in mRNA where the third base of the codon plays a minimum role as a specificity determinant. Within this part of the fitness landscape, a simple evolutionary algorithm, with elementary evolutionary steps comprising swaps of four-codon or two-codon series, was employed to investigate the optimization of codes for the maximum attainable robustness. The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets. The comparison of these sets of codes with the standard code and its locally optimized version showed that, on average, optimization of random codes yielded evolutionary trajectories that converged at the same level of robustness to translation errors as the optimization path of the standard code; however, the standard code required considerably fewer steps to reach that level than an average random code. When evolution starts from random codes whose fitness is comparable to that of the standard code, they typically reach much higher level of optimization than the standard code, i.e., the standard code is much closer to its local minimum (fitness peak) than most of the random codes with similar levels of robustness. Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak. Conclusion The standard code appears to be the result of partial optimization of a random code for robustness to errors of translation. The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system. Thus, evolution of the code can be represented as a combination of adaptation and frozen accident. Reviewers This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight. Open Peer Review This article was reviewed by David Ardell, Allan Drummond (nominated by Laura Landweber), and Rob Knight.
- Published
- 2007
- Full Text
- View/download PDF
28. On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization
- Author
-
Koonin Eugene V and Wolf Yuri I
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The origin of the translation system is, arguably, the central and the hardest problem in the study of the origin of life, and one of the hardest in all evolutionary biology. The problem has a clear catch-22 aspect: high translation fidelity hardly can be achieved without a complex, highly evolved set of RNAs and proteins but an elaborate protein machinery could not evolve without an accurate translation system. The origin of the genetic code and whether it evolved on the basis of a stereochemical correspondence between amino acids and their cognate codons (or anticodons), through selectional optimization of the code vocabulary, as a "frozen accident" or via a combination of all these routes is another wide open problem despite extensive theoretical and experimental studies. Here we combine the results of comparative genomics of translation system components, data on interaction of amino acids with their cognate codons and anticodons, and data on catalytic activities of ribozymes to develop conceptual models for the origins of the translation system and the genetic code. Results Our main guide in constructing the models is the Darwinian Continuity Principle whereby a scenario for the evolution of a complex system must consist of plausible elementary steps, each conferring a distinct advantage on the evolving ensemble of genetic elements. Evolution of the translation system is envisaged to occur in a compartmentalized ensemble of replicating, co-selected RNA segments, i.e., in a RNA World containing ribozymes with versatile activities. Since evolution has no foresight, the translation system could not evolve in the RNA World as the result of selection for protein synthesis and must have been a by-product of evolution drive by selection for another function, i.e., the translation system evolved via the exaptation route. It is proposed that the evolutionary process that eventually led to the emergence of translation started with the selection for ribozymes binding abiogenic amino acids that stimulated ribozyme-catalyzed reactions. The proposed scenario for the evolution of translation consists of the following steps: binding of amino acids to a ribozyme resulting in an enhancement of its catalytic activity; evolution of the amino-acid-stimulated ribozyme into a peptide ligase (predecessor of the large ribosomal subunit) yielding, initially, a unique peptide activating the original ribozyme and, possibly, other ribozymes in the ensemble; evolution of self-charging proto-tRNAs that were selected, initially, for accumulation of amino acids, and subsequently, for delivery of amino acids to the peptide ligase; joining of the peptide ligase with a distinct RNA molecule (predecessor of the small ribosomal subunit) carrying a built-in template for more efficient, complementary binding of charged proto-tRNAs; evolution of the ability of the peptide ligase to assemble peptides using exogenous RNAs as template for complementary binding of charged proteo-tRNAs, yielding peptides with the potential to activate different ribozymes; evolution of the translocation function of the protoribosome leading to the production of increasingly longer peptides (the first proteins), i.e., the origin of translation. The specifics of the recognition of amino acids by proto-tRNAs and the origin of the genetic code depend on whether or not there is a physical affinity between amino acids and their cognate codons or anticodons, a problem that remains unresolved. Conclusion We describe a stepwise model for the origin of the translation system in the ancient RNA world such that each step confers a distinct advantage onto an ensemble of co-evolving genetic elements. Under this scenario, the primary cause for the emergence of translation was the ability of amino acids and peptides to stimulate reactions catalyzed by ribozymes. Thus, the translation system might have evolved as the result of selection for ribozymes capable of, initially, efficient amino acid binding, and subsequently, synthesis of increasingly versatile peptides. Several aspects of this scenario are amenable to experimental testing. Reviewers This article was reviewed by Rob Knight, Doron Lancet, Alexander Mankin (nominated by Arcady Mushegian), and Arcady Mushegian.
- Published
- 2007
- Full Text
- View/download PDF
29. 2022 taxonomic update of phylum Negarnaviricota (Riboviria: Orthornavirae), including the large orders Bunyavirales and Mononegavirales
- Author
-
Kuhn, Jens H., Adkins, Scott, Alkhovsky, Sergey V., Avšič-Županc, Tatjana, Ayllón, María A., Bahl, Justin, Balkema-Buschmann, Anne, Ballinger, Matthew J., Bandte, Martina, Beer, Martin, Bejerman, Nicolas, Bergeron, Éric, Biedenkopf, Nadine, Bigarré, Laurent, Blair, Carol D., Blasdell, Kim R., Bradfute, Steven B., Briese, Thomas, Brown, Paul A., Bruggmann, Rémy, Buchholz, Ursula J., Buchmeier, Michael J., Bukreyev, Alexander, Burt, Felicity, Büttner, Carmen, Calisher, Charles H., Candresse, Thierry, Carson, Jeremy, Casas, Inmaculada, Chandran, Kartik, Charrel, Rémi N., Chiaki, Yuya, Crane, Anya, Crane, Mark, Dacheux, Laurent, Bó, Elena Dal, de la Torre, Juan Carlos, de Lamballerie, Xavier, de Souza, William M., de Swart, Rik L., Dheilly, Nolwenn M., Di Paola, Nicholas, Di Serio, Francesco, Dietzgen, Ralf G., Digiaro, Michele, Drexler, J. Felix, Duprex, W. Paul, Dürrwald, Ralf, Easton, Andrew J., Elbeaino, Toufic, Ergünay, Koray, Feng, Guozhong, Feuvrier, Claudette, Firth, Andrew E., Fooks, Anthony R., Formenty, Pierre B. H., Freitas-Astúa, Juliana, Gago-Zachert, Selma, García, María Laura, García-Sastre, Adolfo, Garrison, Aura R., Godwin, Scott E., Gonzalez, Jean-Paul J., de Bellocq, Joëlle Goüy, Griffiths, Anthony, Groschup, Martin H., Günther, Stephan, Hammond, John, Hepojoki, Jussi, Hierweger, Melanie M., Hongō, Seiji, Horie, Masayuki, Horikawa, Hidenori, Hughes, Holly R., Hume, Adam J., Hyndman, Timothy H., Jiāng, Dàohóng, Jonson, Gilda B., Junglen, Sandra, Kadono, Fujio, Karlin, David G., Klempa, Boris, Klingström, Jonas, Koch, Michel C., Kondō, Hideki, Koonin, Eugene V., Krásová, Jarmila, Krupovic, Mart, Kubota, Kenji, Kuzmin, Ivan V., Laenen, Lies, Lambert, Amy J., Lǐ, Jiànróng, Li, Jun-Min, Lieffrig, François, Lukashevich, Igor S., Luo, Dongsheng, Maes, Piet, Marklewitz, Marco, Marshall, Sergio H., Marzano, Shin-Yi L., McCauley, John W., Mirazimi, Ali, Mohr, Peter G., Moody, Nick J. G., Morita, Yasuaki, Morrison, Richard N., Mühlberger, Elke, Naidu, Rayapati, Natsuaki, Tomohide, Navarro, José A., Neriya, Yutaro, Netesov, Sergey V., Neumann, Gabriele, Nowotny, Norbert, Ochoa-Corona, Francisco M., Palacios, Gustavo, Pallandre, Laurane, Pallás, Vicente, Papa, Anna, Paraskevopoulou, Sofia, Parrish, Colin R., Pauvolid-Corrêa, Alex, Pawęska, Janusz T., Pérez, Daniel R., Pfaff, Florian, Plemper, Richard K., Postler, Thomas S., Pozet, Françoise, Radoshitzky, Sheli R., Ramos-González, Pedro L., Rehanek, Marius, Resende, Renato O., Reyes, Carina A., Romanowski, Víctor, Rubbenstroth, Dennis, Rubino, Luisa, Rumbou, Artemis, Runstadler, Jonathan A., Rupp, Melanie, Sabanadzovic, Sead, Sasaya, Takahide, Schmidt-Posthaus, Heike, Schwemmle, Martin, Seuberlich, Torsten, Sharpe, Stephen R., Shi, Mang, Sironi, Manuela, Smither, Sophie, Song, Jin-Won, Spann, Kirsten M., Spengler, Jessica R., Stenglein, Mark D., Takada, Ayato, Tesh, Robert B., Těšíková, Jana, Thornburg, Natalie J., Tischler, Nicole D., Tomitaka, Yasuhiro, Tomonaga, Keizō, Tordo, Noël, Tsunekawa, Kenta, Turina, Massimo, Tzanetakis, Ioannis E., Vaira, Anna Maria, van den Hoogen, Bernadette, Vanmechelen, Bert, Vasilakis, Nikos, Verbeek, Martin, von Bargen, Susanne, Wada, Jiro, Wahl, Victoria, Walker, Peter J., Whitfield, Anna E., Williams, John V., Wolf, Yuri I., Yamasaki, Junki, Yanagisawa, Hironobu, Ye, Gongyin, Zhang, Yong-Zhen, and Økland, Arnfinn Lodden
- Published
- 2022
- Full Text
- View/download PDF
30. Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus
- Author
-
Koonin Eugene V, Holmes Edward C, Viboud Cecile, Wolf Yuri I, and Lipman David J
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background The interpandemic evolution of the influenza A virus hemagglutinin (HA) protein is commonly considered a paragon of rapid evolutionary change under positive selection in which amino acid replacements are fixed by virtue of their effect on antigenicity, enabling the virus to evade immune surveillance. Results We performed phylogenetic analyses of the recently obtained large and relatively unbiased samples of the HA sequences from 1995–2005 isolates of the H3N2 and H1N1 subtypes of influenza A virus. Unexpectedly, it was found that the evolution of H3N2 HA includes long intervals of generally neutral sequence evolution without apparent substantial antigenic change ("stasis" periods) that are characterized by an excess of synonymous over nonsynonymous substitutions per site, lack of association of amino acid replacements with epitope regions, and slow extinction of coexisting virus lineages. These long periods of stasis are punctuated by shorter intervals of rapid evolution under positive selection during which new dominant lineages quickly displace previously coexisting ones. The preponderance of positive selection during intervals of rapid evolution is supported by the dramatic excess of amino acid replacements in the epitope regions of HA compared to replacements in the rest of the HA molecule. In contrast, the stasis intervals showed a much more uniform distribution of replacements over the HA molecule, with a statistically significant difference in the rate of synonymous over nonsynonymous substitution in the epitope regions between the two modes of evolution. A number of parallel amino acid replacements – the same amino acid substitution occurring independently in different lineages – were also detected in H3N2 HA. These parallel mutations were, largely, associated with periods of rapid fitness change, indicating that there are major limitations on evolutionary pathways during antigenic change. The finding that stasis is the prevailing modality of H3N2 evolution suggests that antigenic changes that lead to an increase in fitness typically result from epistatic interactions between several amino acid substitutions in the HA and, perhaps, other viral proteins. The strains that become dominant due to increased fitness emerge from low frequency strains thanks to the last amino acid replacement that completes the set of replacements required to produce a significant antigenic change; no subset of substitutions results in a biologically significant antigenic change and corresponding fitness increase. In contrast to H3N2, no clear intervals of evolution under positive selection were detected for the H1N1 HA during the same time span. Thus, the ascendancy of H1N1 in some seasons is, most likely, caused by the drop in the relative fitness of the previously prevailing H3N2 lineages as the fraction of susceptible hosts decreases during the stasis intervals. Table 1 Numbers of synonymous and nonsynonymous substitution per site (dN/dS) in H3N2 HA Protein sites dN/dS ratio; tree partition All branches Trunk branches Other branches H3N2 HA 0.27 ± 0.02 0.35 ± 0.08 0.26 ± 0.02 H3N2 HA1 0.37 ± 0.04 0.57 ± 0.15 0.34 ± 0.04 H3N2 HA2 0.13 ± 0.02 0.10 ± 0.05 0.14 ± 0.03 H3N2 epitopes 0.63 ± 0.09 1.85 ± 0.82 0.53 ± 0.08 H3N2 non-epitopes 0.15 ± 0.02 0.09 ± 0.04 0.16 ± 0.02 Conclusion We show that the common view of the evolution of influenza virus as a rapid, positive selection-driven process is, at best, incomplete. Rather, the interpandemic evolution of influenza appears to consist of extended intervals of stasis, which are characterized by neutral sequence evolution, punctuated by shorter intervals of rapid fitness increase when evolutionary change is driven by positive selection. These observations have implications for influenza surveillance and vaccine formulation; in particular, the possibility exists that parallel amino acid replacements could serve as a predictor of new dominant strains. Reviewers Ron Fouchier (nominated by Andrey Rzhetsky), David Krakauer, Christopher Lee
- Published
- 2006
- Full Text
- View/download PDF
31. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action
- Author
-
Wolf Yuri I, Shabalina Svetlana A, Grishin Nick V, Makarova Kira S, and Koonin Eugene V
- Subjects
Biology (General) ,QH301-705.5 - Abstract
Abstract Background All archaeal and many bacterial genomes contain Clustered Regularly Interspaced Short Palindrome Repeats (CRISPR) and variable arrays of the CRISPR-associated (cas) genes that have been previously implicated in a novel form of DNA repair on the basis of comparative analysis of their protein product sequences. However, the proximity of CRISPR and cas genes strongly suggests that they have related functions which is hard to reconcile with the repair hypothesis. Results The protein sequences of the numerous cas gene products were classified into ~25 distinct protein families; several new functional and structural predictions are described. Comparative-genomic analysis of CRISPR and cas genes leads to the hypothesis that the CRISPR-Cas system (CASS) is a mechanism of defense against invading phages and plasmids that functions analogously to the eukaryotic RNA interference (RNAi) systems. Specific functional analogies are drawn between several components of CASS and proteins involved in eukaryotic RNAi, including the double-stranded RNA-specific helicase-nuclease (dicer), the endonuclease cleaving target mRNAs (slicer), and the RNA-dependent RNA polymerase. However, none of the CASS components is orthologous to its apparent eukaryotic functional counterpart. It is proposed that unique inserts of CRISPR, some of which are homologous to fragments of bacteriophage and plasmid genes, function as prokaryotic siRNAs (psiRNA), by base-pairing with the target mRNAs and promoting their degradation or translation shutdown. Specific hypothetical schemes are developed for the functioning of the predicted prokaryotic siRNA system and for the formation of new CRISPR units with unique inserts encoding psiRNA conferring immunity to the respective newly encountered phages or plasmids. The unique inserts in CRISPR show virtually no similarity even between closely related bacterial strains which suggests their rapid turnover, on evolutionary scale. Corollaries of this finding are that, even among closely related prokaryotes, the most commonly encountered phages and plasmids are different and/or that the dominant phages and plasmids turn over rapidly. Conclusion We proposed previously that Cas proteins comprise a novel DNA repair system. The association of the cas genes with CRISPR and, especially, the presence, in CRISPR units, of unique inserts homologous to phage and plasmid genes make us abandon this hypothesis. It appears most likely that CASS is a prokaryotic system of defense against phages and plasmids that functions via the RNAi mechanism. The functioning of this system seems to involve integration of fragments of foreign genes into archaeal and bacterial chromosomes yielding heritable immunity to the respective agents. However, it appears that this inheritance is extremely unstable on the evolutionary scale such that the repertoires of unique psiRNAs are completely replaced even in closely related prokaryotes, presumably, in response to rapidly changing repertoires of dominant phages and plasmids. This article was reviewed by: Eric Bapteste, Patrick Forterre, and Martijn Huynen. Open peer review Reviewed by Eric Bapteste, Patrick Forterre, and Martijn Huynen. For the full reviews, please go to the Reviewers' comments section.
- Published
- 2006
- Full Text
- View/download PDF
32. Comparative genomics of Thermus thermophilus and Deinococcus radiodurans: divergent routes of adaptation to thermophily and radiation resistance
- Author
-
Daly Michael J, Zhai Min, Vasilenko Alexander, Matrosova Vera Y, Gaidamakova Elena K, Wolf Yuri I, Omelchenko Marina V, Koonin Eugene V, and Makarova Kira S
- Subjects
Evolution ,QH359-425 - Abstract
Abstract Background Thermus thermophilus and Deinococcus radiodurans belong to a distinct bacterial clade but have remarkably different phenotypes. T. thermophilus is a thermophile, which is relatively sensitive to ionizing radiation and desiccation, whereas D. radiodurans is a mesophile, which is highly radiation- and desiccation-resistant. Here we present an in-depth comparison of the genomes of these two related but differently adapted bacteria. Results By reconstructing the evolution of Thermus and Deinococcus after the divergence from their common ancestor, we demonstrate a high level of post-divergence gene flux in both lineages. Various aspects of the adaptation to high temperature in Thermus can be attributed to horizontal gene transfer from archaea and thermophilic bacteria; many of the horizontally transferred genes are located on the single megaplasmid of Thermus. In addition, the Thermus lineage has lost a set of genes that are still present in Deinococcus and many other mesophilic bacteria but are not common among thermophiles. By contrast, Deinococcus seems to have acquired numerous genes related to stress response systems from various bacteria. A comparison of the distribution of orthologous genes among the four partitions of the Deinococcus genome and the two partitions of the Thermus genome reveals homology between the Thermus megaplasmid (pTT27) and Deinococcus megaplasmid (DR177). Conclusion After the radiation from their common ancestor, the Thermus and Deinococcus lineages have taken divergent paths toward their distinct lifestyles. In addition to extensive gene loss, Thermus seems to have acquired numerous genes from thermophiles, which likely was the decisive contribution to its thermophilic adaptation. By contrast, Deinococcus lost few genes but seems to have acquired many bacterial genes that apparently enhanced its ability to survive different kinds of environmental stresses. Notwithstanding the accumulation of horizontally transferred genes, we also show that the single megaplasmid of Thermus and the DR177 megaplasmid of Deinococcus are homologous and probably were inherited from the common ancestor of these bacteria.
- Published
- 2005
- Full Text
- View/download PDF
33. Modularity and diversity of target selectors in Tn7 transposons
- Author
-
Faure, Guilhem, Saito, Makoto, Benler, Sean, Peng, Iris, Wolf, Yuri I., Strecker, Jonathan, Altae-Tran, Han, Neumann, Edwin, Li, David, Makarova, Kira S., Macrae, Rhiannon K., Koonin, Eugene V., and Zhang, Feng
- Published
- 2023
- Full Text
- View/download PDF
34. Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models
- Author
-
Berezovskaya Faina S, Wolf Yuri I, Karev Georgy P, and Koonin Eugene V
- Subjects
Evolution ,QH359-425 - Abstract
Abstract Background The size distribution of gene families in a broad range of genomes is well approximated by a generalized Pareto function. Evolution of ensembles of gene families can be described with Birth, Death, and Innovation Models (BDIMs). Analysis of the properties of different versions of BDIMs has the potential of revealing important features of genome evolution. Results In this work, we extend our previous analysis of stochastic BDIMs. In addition to the previously examined rational BDIMs, we introduce potentially more realistic logistic BDIMs, in which birth/death rates are limited for the largest families, and show that their properties are similar to those of models that include no such limitation. We show that the mean time required for the formation of the largest gene families detected in eukaryotic genomes is limited by the mean number of duplications per gene and does not increase indefinitely with the model degree. Instead, this time reaches a minimum value, which corresponds to a non-linear rational BDIM with the degree of approximately 2.7. Even for this BDIM, the mean time of the largest family formation is orders of magnitude greater than any realistic estimates based on the timescale of life's evolution. We employed the embedding chains technique to estimate the expected number of elementary evolutionary events (gene duplications and deletions) preceding the formation of gene families of the observed size and found that the mean number of events exceeds the family size by orders of magnitude, suggesting a highly dynamic process of genome evolution. The variance of the time required for the formation of the largest families was found to be extremely large, with the coefficient of variation >> 1. This indicates that some gene families might grow much faster than the mean rate such that the minimal time required for family formation is more relevant for a realistic representation of genome evolution than the mean time. We determined this minimal time using Monte Carlo simulations of family growth from an ensemble of simultaneously evolving singletons. In these simulations, the time elapsed before the formation of the largest family was much shorter than the estimated mean time and was compatible with the timescale of evolution of eukaryotes. Conclusions The analysis of stochastic BDIMs presented here shows that non-linear versions of such models can well approximate not only the size distribution of gene families but also the dynamics of their formation during genome evolution. The fact that only higher degree BDIMs are compatible with the observed characteristics of genome evolution suggests that the growth of gene families is self-accelerating, which might reflect differential selective pressure acting on different genes.
- Published
- 2004
- Full Text
- View/download PDF
35. Duplicated genes evolve slower than singletons despite the initial rate increase
- Author
-
Koonin Eugene V, Wolf Yuri I, and Jordan I King
- Subjects
Evolution ,QH359-425 - Abstract
Abstract Background Gene duplication is an important mechanism that can lead to the emergence of new functions during evolution. The impact of duplication on the mode of gene evolution has been the subject of several theoretical and empirical comparative-genomic studies. It has been shown that, shortly after the duplication, genes seem to experience a considerable relaxation of purifying selection. Results Here we demonstrate two opposite effects of gene duplication on evolutionary rates. Sequence comparisons between paralogs show that, in accord with previous observations, a substantial acceleration in the evolution of paralogs occurs after duplication, presumably due to relaxation of purifying selection. The effect of gene duplication on evolutionary rate was also assessed by sequence comparison between orthologs that have paralogs (duplicates) and those that do not (singletons). It is shown that, in eukaryotes, duplicates, on average, evolve significantly slower than singletons. Eukaryotic ortholog evolutionary rates for duplicates are also negatively correlated with the number of paralogs per gene and the strength of selection between paralogs. A tally of annotated gene functions shows that duplicates tend to be enriched for proteins with known functions, particularly those involved in signaling and related cellular processes; by contrast, singletons include an over-abundance of poorly characterized proteins. Conclusions These results suggest that whether or not a gene duplicate is retained by selection depends critically on the pre-existing functional utility of the protein encoded by the ancestral singleton. Duplicates of genes of a higher biological import, which are subject to strong functional constraints on the sequence, are retained relatively more often. Thus, the evolutionary trajectory of duplicated genes appears to be determined by two opposing trends, namely, the post-duplication rate acceleration and the generally slow evolutionary rate owing to the high level of functional constraints.
- Published
- 2004
- Full Text
- View/download PDF
36. The COG database: an updated version includes eukaryotes
- Author
-
Sverdlov Alexander V, Smirnov Sergei, Rao B Sridhar, Nikolskaya Anastasia N, Mekhedov Sergei L, Mazumder Raja, Krylov Dmitri M, Koonin Eugene V, Kiryutin Boris, Jacobs Aviva R, Jackson John D, Fedorova Natalie D, Tatusov Roman L, Vasudevan Sona, Wolf Yuri I, Yin Jodie J, and Natale Darren A
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. Results We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. Conclusion The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.
- Published
- 2003
- Full Text
- View/download PDF
37. Correction: No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly
- Author
-
Koonin Eugene V, Wolf Yuri I, and Jordan I King
- Subjects
Evolution ,QH359-425 - Published
- 2003
- Full Text
- View/download PDF
38. No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly
- Author
-
Koonin Eugene V, Wolf Yuri I, and Jordan I King
- Subjects
Evolution ,QH359-425 - Abstract
Abstract Background It has been suggested that rates of protein evolution are influenced, to a great extent, by the proportion of amino acid residues that are directly involved in protein function. In agreement with this hypothesis, recent work has shown a negative correlation between evolutionary rates and the number of protein-protein interactions. However, the extent to which the number of protein-protein interactions influences evolutionary rates remains unclear. Here, we address this question at several different levels of evolutionary relatedness. Results Manually curated data on the number of protein-protein interactions among Saccharomyces cerevisiae proteins was examined for possible correlation with evolutionary rates between S. cerevisiae and Schizosaccharomyces pombe orthologs. Only a very weak negative correlation between the number of interactions and evolutionary rate of a protein was observed. Furthermore, no relationship was found between a more general measure of the evolutionary conservation of S. cerevisiae proteins, based on the taxonomic distribution of their homologs, and the number of protein-protein interactions. However, when the proteins from yeast were assorted into discrete bins according to the number of interactions, it turned out that 6.5% of the proteins with the greatest number of interactions evolved, on average, significantly slower than the rest of the proteins. Comparisons were also performed using protein-protein interaction data obtained with high-throughput analysis of Helicobacter pylori proteins. No convincing relationship between the number of protein-protein interactions and evolutionary rates was detected, either for comparisons of orthologs from two completely sequenced H. pylori strains or for comparisons of H. pylori and Campylobacter jejuni orthologs, even when the proteins were classified into bins by the number of interactions. Conclusion The currently available comparative-genomic data do not support the hypothesis that the evolutionary rates of the majority of proteins substantially depend on the number of protein-protein interactions they are involved in. However, a small fraction of yeast proteins with the largest number of interactions (the hubs of the interaction network) tend to evolve slower than the bulk of the proteins.
- Published
- 2003
- Full Text
- View/download PDF
39. Birth and death of protein domains: A simple model of evolution explains power law behavior
- Author
-
Berezovskaya Faina S, Rzhetsky Andrey Y, Wolf Yuri I, Karev Georgy P, and Koonin Eugene V
- Subjects
Evolution ,QH359-425 - Abstract
Abstract Background Power distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactions partners of a given protein, the number of members in paralogous families, and other quantities. In network analysis, power laws imply evolution of the network with preferential attachment, i.e. a greater likelihood of nodes being added to pre-existing hubs. Exploration of different types of evolutionary models in an attempt to determine which of them lead to power law distributions has the potential of revealing non-trivial aspects of genome evolution. Results A simple model of evolution of the domain composition of proteomes was developed, with the following elementary processes: i) domain birth (duplication with divergence), ii) death (inactivation and/or deletion), and iii) innovation (emergence from non-coding or non-globular sequences or acquisition via horizontal gene transfer). This formalism can be described as a birth, death and innovation model (BDIM). The formulas for equilibrium frequencies of domain families of different size and the total number of families at equilibrium are derived for a general BDIM. All asymptotics of equilibrium frequencies of domain families possible for the given type of models are found and their appearance depending on model parameters is investigated. It is proved that the power law asymptotics appears if, and only if, the model is balanced, i.e. domain duplication and deletion rates are asymptotically equal up to the second order. It is further proved that any power asymptotic with the degree not equal to -1 can appear only if the hypothesis of independence of the duplication/deletion rates on the size of a domain family is rejected. Specific cases of BDIMs, namely simple, linear, polynomial and rational models, are considered in details and the distributions of the equilibrium frequencies of domain families of different size are determined for each case. We apply the BDIM formalism to the analysis of the domain family size distributions in prokaryotic and eukaryotic proteomes and show an excellent fit between these empirical data and a particular form of the model, the second-order balanced linear BDIM. Calculation of the parameters of these models suggests surprisingly high innovation rates, comparable to the total domain birth (duplication) and elimination rates, particularly for prokaryotic genomes. Conclusions We show that a straightforward model of genome evolution, which does not explicitly include selection, is sufficient to explain the observed distributions of domain family sizes, in which power laws appear as asymptotic. However, for the model to be compatible with the data, there has to be a precise balance between domain birth, death and innovation rates, and this is likely to be maintained by selection. The developed approach is oriented at a mathematical description of evolution of domain composition of proteomes, but a simple reformulation could be applied to models of other evolving networks with preferential attachment.
- Published
- 2002
- Full Text
- View/download PDF
40. Genome trees constructed using five different approaches suggest new major bacterial clades
- Author
-
Tatusov Roman L, Grishin Nick V, Rogozin Igor B, Wolf Yuri I, and Koonin Eugene V
- Subjects
Evolution ,QH359-425 - Abstract
Abstract Background The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes. Results Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the low-GC Gram-positive bacteria at a deeper tree node. These new groupings of bacteria were supported by the analysis of alternative topologies in the concatenated ribosomal protein tree using the Kishino-Hasegawa test and by a census of the topologies of 132 individual groups of orthologous proteins. Additionally, the results of this analysis put into question the sister-group relationship between the two major archaeal groups, Euryarchaeota and Crenarchaeota, and suggest instead that Euryarchaeota might be a paraphyletic group with respect to Crenarchaeota. Conclusions We conclude that, the extensive horizontal gene flow and lineage-specific gene loss notwithstanding, extension of phylogenetic analysis to the genome scale has the potential of uncovering deep evolutionary relationships between prokaryotic lineages.
- Published
- 2001
- Full Text
- View/download PDF
41. Jumping DNA polymerases in bacteriophages
- Author
-
Yutin, Natalya, primary, Tolstoy, Igor, additional, Mutz, Pascal, additional, Wolf, Yuri I, additional, Krupovic, Mart, additional, and Koonin, Eugene V, additional
- Published
- 2024
- Full Text
- View/download PDF
42. On the feasibility of saltational evolution
- Author
-
Katsnelson, Mikhail I., Wolf, Yuri I., and Koonin, Eugene V.
- Subjects
Quantitative Biology - Populations and Evolution - Abstract
Is evolution always gradual or can it make leaps? We examine a mathematical model of an evolutionary process on a fitness landscape and obtain analytic solutions for the probability of multi-mutation leaps, that is, several mutations occurring simultaneously, within a single generation in one genome, and being fixed all together in the evolving population. The results indicate that, for typical, empirically observed combinations of the parameters of the evolutionary process, namely, effective population size, mutation rate, and distribution of selection coefficients of mutations, the probability of a multi-mutation leap is low, and accordingly, the contribution of such leaps is minor at best. However, we show that, taking sign epistasis into account, leaps could become an important factor of evolution in cases of substantially elevated mutation rates, such as stress-induced mutagenesis in microbes. We hypothesize that stress-induced mutagenesis is an evolvable adaptive strategy., Comment: Extended version, in particular, the section is added on non-equilibrium model of stress-induced mutagenesis
- Published
- 2018
- Full Text
- View/download PDF
43. Physical foundations of biological complexity
- Author
-
Wolf, Yuri I., Katsnelson, Mikhail I., and Koonin, Eugene V.
- Subjects
Condensed Matter - Statistical Mechanics ,Condensed Matter - Disordered Systems and Neural Networks ,Quantitative Biology - Populations and Evolution - Abstract
Biological systems reach hierarchical complexity that has no counterpart outside the realm of biology. Undoubtedly, biological entities obey the fundamental physical laws. Can today's physics provide an explanatory framework for understanding the evolution of biological complexity? We argue here that the physical foundation for understanding the origin and evolution of complexity can be envisaged at the interface between the theory of frustrated states resulting in pattern formation in glass-like media and the theory of self-organized criticality (SOC). On the one hand, SOC has been shown to emerge in spin glass systems of high dimensionality. On the other hand, SOC is often viewed as the most appropriate physical description of evolutionary transitions in biology. We unify these two faces of SOC by showing that emergence of complex features in biological evolution typically if not always is triggered by frustration that is caused by competing interactions at different organizational levels. Competing interactions and frustrated states permeate biology at all organizational levels and are tightly linked to the ubiquitous competition for limiting resources. This perspective extends from the comparatively simple phenomena occurring in glasses to large-scale events of biological evolution, such as major evolutionary transitions. We therefore submit that frustration caused by competing interactions in multidimensional systems is the general driving force behind the emergence of complexity, within and beyond the domain of biology., Comment: 27 pages, 2 figures
- Published
- 2018
- Full Text
- View/download PDF
44. Ongoing global and regional adaptive evolution of SARS-CoV-2
- Author
-
Rochman, Nash D., Wolf, Yuri I., Faure, Guilhem, Mutz, Pascal, Zhang, Feng, and Koonin, Eugene V.
- Published
- 2021
45. Host age structure reshapes parasite symbiosis: collaboration begets pathogens, competition begets virulent mutualists
- Author
-
Portner, Carsten O. S., Rong, Edward G., Ramirez, Jared A., Wolf, Yuri I., Bosse, Angelique P., Koonin, Eugene V., and Rochman, Nash D.
- Published
- 2022
- Full Text
- View/download PDF
46. Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions
- Author
-
Karamycheva, Svetlana, Wolf, Yuri I., Persi, Erez, Koonin, Eugene V., and Makarova, Kira S.
- Published
- 2022
- Full Text
- View/download PDF
47. Phylogenomic analysis of the diversity of graspetides and proteins involved in their biosynthesis
- Author
-
Makarova, Kira S., Blackburne, Brittney, Wolf, Yuri I., Nikolskaya, Anastasia, Karamycheva, Svetlana, Espinoza, Marlene, Barry, III, Clifton E., Bewley, Carole A., and Koonin, Eugene V.
- Published
- 2022
- Full Text
- View/download PDF
48. Towards physical principles of biological evolution
- Author
-
Katsnelson, Mikhail I., Wolf, Yuri I., and Koonin, Eugene V.
- Subjects
Quantitative Biology - Other Quantitative Biology ,Condensed Matter - Statistical Mechanics ,Quantitative Biology - Populations and Evolution - Abstract
Biological systems reach organizational complexity that far exceeds the complexity of any known inanimate objects. Biological entities undoubtedly obey the laws of quantum physics and statistical mechanics. However, is modern physics sufficient to adequately describe, model and explain the evolution of biological complexity? Detailed parallels have been drawn between statistical thermodynamics and the population-genetic theory of biological evolution. Based on these parallels, we outline new perspectives on biological innovation and major transitions in evolution, and introduce a biological equivalent of thermodynamic potential that reflects the innovation propensity of an evolving population. Deep analogies have been suggested to also exist between the properties of biological entities and processes, and those of frustrated states in physics, such as glasses. We extend such analogies by examining frustration-type phenomena, such as conflicts between different levels of selection, in biological evolution. We further address evolution in multidimensional fitness landscapes from the point of view of percolation theory and suggest that percolation at level above the critical threshold dictates the tree-like evolution of complex organisms. Taken together, these multiple connections between fundamental processes in physics and biology imply that construction of a meaningful physical theory of biological evolution might not be a futile effort., Comment: Invited article, Focus Issue on 21th Century Frontiers, final version
- Published
- 2017
- Full Text
- View/download PDF
49. Long range segmentation of prokaryotic genomes by gene age and functionality.
- Author
-
Wolf, Yuri I, Schurov, Ilya V, Makarova, Kira S, Katsnelson, Mikhail I, and Koonin, Eugene V
- Published
- 2024
- Full Text
- View/download PDF
50. Evolution in the weak-mutation limit : Stasis periods punctuated by fast transitions between saddle points on the fitness landscape
- Author
-
Bakhtin, Yuri, Katsnelson, Mikhail I., Wolf, Yuri I., and Koonin, Eugene V.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.