Back to Search Start Over

Producing polished prokaryotic pangenomes with the Panaroo pipeline

Authors :
Aaron Weimann
Stephanie W. Lo
Neil MacAlasdair
John A. Lees
Rebecca A. Gladstone
Simon D. W. Frost
Gerry Tonkin-Hill
Christopher A. Beaudoin
Gal Horesh
Julian Parkhill
Jukka Corander
R. Andres Floto
Christopher Ruis
Stephen D. Bentley
Tonkin-Hill, Gerry [0000-0003-4397-2224]
Apollo - University of Cambridge Repository
Helsinki Institute for Information Technology
Jukka Corander / Principal Investigator
Department of Mathematics and Statistics
Biostatistics Helsinki
Apollo-University Of Cambridge Repository
Source :
Genome Biology, Vol 21, Iss 1, Pp 1-21 (2020), Genome Biology
Publication Year :
2021
Publisher :
Apollo - University of Cambridge Repository, 2021.

Abstract

Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content, resulting from frequent horizontal gene transfer, gene duplication and gene loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here we introduce Panaroo, a graph based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. We verified our approach through extensive simulations of de novo assemblies using the infinitely many genes model and by analysing a number of publicly available large bacterial genome datasets. Using a highly clonal Mycobacterium tuberculosis dataset as a negative control case, we show that failing to account for annotation errors can lead to pangenome estimates that are dominated by error. We additionally demonstrate the utility of the improved graphical output provided by Panaroo by performing a pan-genome wide association study in Neisseria gonorrhoeae and by analysing gene gain and loss rates across 51 of the major global pneumococcal sequence clusters. Panaroo is freely available under an open source MIT licence at https://github.com/gtonkinhill/panaroo.

Details

ISSN :
1474760X
Database :
OpenAIRE
Journal :
Genome Biology, Vol 21, Iss 1, Pp 1-21 (2020), Genome Biology
Accession number :
edsair.doi.dedup.....2e0dec012f7becb2f05c723fee4a161e
Full Text :
https://doi.org/10.17863/cam.73282