Edouard Bingen, Olivier Clermont, Chantal Le Bouguénec, Zoé Rouy, Sophie Oztas, James R. Johnson, Anne-Marie Gilles, Erick Denamur, Marie Touchon, Olivier Tenaillon, David Vallenet, Valérie Barbe, Marie-Agnès Petit, Meriem El Karoui, Stéphane Bonacorsi, Dominique Schneider, Louis Garry, Eduardo P. C. Rocha, Alexandra Calteau, Antoine Danchin, Xavier Nassif, Ivan Matic, Vanessa Martinez-Jéhanne, Jérôme Tourret, Sophie Mangenot, Christophe Pichon, Jean Marc Ghigo, Claude Saint Ruf, Philippe Bidet, Claire Hoede, Eric Frapy, Christiane Bouchier, Simon Baeriswyl, Benoit Vacherie, Stéphane Cruveiller, Mathilde Lescat, Odile Bouvet, Carole Dossat, Claudine Médigue, Médéric Diard, Hélène Chiapello, Atelier de BioInformatique (ABI), Université Pierre et Marie Curie - Paris 6 (UPMC), Génomique évolutive des microbes, Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS), Ecologie et Evolution des Microorganismes (EEM), Université Paris 13 (UP13)-Université Paris Diderot - Paris 7 (UPD7)-Université Sorbonne Paris Cité (USPC)-Institut National de la Santé et de la Recherche Médicale (INSERM), Genoscope - Centre national de séquençage [Evry] (GENOSCOPE), Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Génétique moléculaire, évolutive et médicale, IFR65-Université Paris Descartes - Paris 5 (UPD5)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris 7, Hôpital Robert Debré-Université Paris Diderot - Paris 7 (UPD7)-Institut National de la Santé et de la Recherche Médicale (INSERM), Génomique (Plate-Forme) - Genomics Platform, Institut Pasteur [Paris] (IP), Génomique métabolique (UMR 8030), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université d'Évry-Val-d'Essonne (UEVE)-Centre National de la Recherche Scientifique (CNRS), Mathématique, Informatique, et Génomique, Institut National de la Recherche Agronomique (INRA), Génétique des Génomes Bactériens, Pathogénie des infections systémiques (UMR_S 570), Université Paris Descartes - Paris 5 (UPD5)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Génétique des Biofilms, Veterans Affairs Medical Center, Department of Medecine, University of Minnesota [Twin Cities] (UMN), University of Minnesota System-University of Minnesota System, Pathogénie Bactérienne des Muqueuses, Unité des Bactéries Lactiques et Pathogènes Opportunistes, Laboratoire Adaptation et pathogénie des micro-organismes [Grenoble] (LAPM), Université Joseph Fourier - Grenoble 1 (UJF)-Centre National de la Recherche Scientifique (CNRS), Gagnon, Jean, Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), Université Paris Diderot - Paris 7 (UPD7)-Université Paris 13 (UP13)-Université Sorbonne Paris Cité (USPC)-Institut National de la Santé et de la Recherche Médicale (INSERM), Génomique (Plate-Forme), Institut Pasteur [Paris], University of Minnesota [Twin Cities], IFR65-Université Paris Descartes - Paris 5 (UPD5)-Institut National de la Santé et de la Recherche Médicale (INSERM), Centre National de la Recherche Scientifique (CNRS)-Institut Pasteur [Paris], Centre National de la Recherche Scientifique (CNRS)-Université Joseph Fourier - Grenoble 1 (UJF), Atelier de BioInformatique ( ABI ), Université Pierre et Marie Curie - Paris 6 ( UPMC ), Institut Pasteur [Paris]-Centre National de la Recherche Scientifique ( CNRS ), Ecologie et Evolution des Microorganismes ( EEM ), Université Paris 13 ( UP13 ) -Université Paris Diderot - Paris 7 ( UPD7 ) -Université Sorbonne Paris Cité ( USPC ) -Institut National de la Santé et de la Recherche Médicale ( INSERM ), Genoscope - Centre national de séquençage [Evry] ( GENOSCOPE ), Commissariat à l'énergie atomique et aux énergies alternatives ( CEA ), IFR65-Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Institut National de la Santé et de la Recherche Médicale ( INSERM ), Hôpital Robert Debré-Université Paris Diderot - Paris 7 ( UPD7 ) -Institut National de la Santé et de la Recherche Médicale ( INSERM ), Génomique métabolique ( UMR 8030 ), Commissariat à l'énergie atomique et aux énergies alternatives ( CEA ) -Université d'Évry-Val-d'Essonne ( UEVE ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ), Institut National de la Recherche Agronomique ( INRA ), Pathogénie des infections systémiques ( UMR_S 570 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Centre National de la Recherche Scientifique ( CNRS ), University of Minnesota [Minneapolis], Laboratoire Adaptation et pathogénie des micro-organismes [Grenoble] ( LAPM ), and Université Joseph Fourier - Grenoble 1 ( UJF ) -Centre National de la Recherche Scientifique ( CNRS )
The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ∼18,000 families of orthologous genes, we found ∼2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome., Author Summary Although abundant knowledge has been accumulated regarding the E. coli laboratory strain K-12, little is known about the evolutionary trajectories that have driven the high diversity observed among natural isolates of the species, which encompass both commensal and highly virulent intestinal and extraintestinal pathogenic strains. We have annotated or re-annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Although recombination rates are much higher than mutation rates, we were able to reconstruct a robust phylogeny based on the ∼2,000 genes common to all strains. Based on this phylogeny, we established the evolutionary scenario of gains and losses of thousands of specific genes, identifying functional classes under opposite selection pressures. This genome flux is confined to very few positions in the chromosome, which are the same for every genome. Notably, we identified few or no extraintestinal virulence-specific genes. We also defined a long-scale structure of recombination in the genome with lower recombination rates at the terminus of replication. These findings demonstrate that, despite a very high gene flow, genes can co-exist in an organised genome.