Back to Search
Start Over
Producing polished prokaryotic pangenomes with the Panaroo pipeline
- Source :
- Genome Biology, Vol 21, Iss 1, Pp 1-21 (2020), Genome Biology
- Publication Year :
- 2021
- Publisher :
- Apollo - University of Cambridge Repository, 2021.
-
Abstract
- Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content, resulting from frequent horizontal gene transfer, gene duplication and gene loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here we introduce Panaroo, a graph based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. We verified our approach through extensive simulations of de novo assemblies using the infinitely many genes model and by analysing a number of publicly available large bacterial genome datasets. Using a highly clonal Mycobacterium tuberculosis dataset as a negative control case, we show that failing to account for annotation errors can lead to pangenome estimates that are dominated by error. We additionally demonstrate the utility of the improved graphical output provided by Panaroo by performing a pan-genome wide association study in Neisseria gonorrhoeae and by analysing gene gain and loss rates across 51 of the major global pneumococcal sequence clusters. Panaroo is freely available under an open source MIT licence at https://github.com/gtonkinhill/panaroo.
- Subjects :
- GENES MODEL
Computer science
05 Environmental Sciences
PROTEIN
Genome
Pangenome
0302 clinical medicine
Gene duplication
111 Mathematics
lcsh:QH301-705.5
Genetics & Heredity
0303 health sciences
education.field_of_study
1184 Genetics, developmental biology, physiology
Genomics
Horizontal gene transfer
Biological Evolution
GENOME
ALIGNMENT
Klebsiella pneumoniae
Prokaryote
LIBRARY
Life Sciences & Biomedicine
Algorithms
lcsh:QH426-470
Bioinformatics
Population
Bacterial genome size
Computational biology
Biology
FREQUENCY
Clustering
03 medical and health sciences
Annotation
Drug Resistance, Bacterial
Gene family
ALGORITHM
education
Gene
030304 developmental biology
Science & Technology
IDENTIFICATION
Bacteria
030306 microbiology
Mycobacterium tuberculosis
06 Biological Sciences
Graph genomes
biology.organism_classification
lcsh:Genetics
Biotechnology & Applied Microbiology
lcsh:Biology (General)
08 Information and Computing Sciences
030217 neurology & neurosurgery
Genome, Bacterial
Software
GENERATION
Subjects
Details
- ISSN :
- 1474760X
- Database :
- OpenAIRE
- Journal :
- Genome Biology, Vol 21, Iss 1, Pp 1-21 (2020), Genome Biology
- Accession number :
- edsair.doi.dedup.....2e0dec012f7becb2f05c723fee4a161e
- Full Text :
- https://doi.org/10.17863/cam.73282