Back to Search Start Over

Benchmarking of long-read assemblers for prokaryote whole genome sequencing [version 2; peer review: 4 approved]

Authors :
Ryan R. Wick
Kathryn E. Holt
Author Affiliations :
<relatesTo>1</relatesTo>Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia<br /><relatesTo>2</relatesTo>Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
Source :
F1000Research. 8:2138
Publication Year :
2020
Publisher :
London, UK: F1000 Research Limited, 2020.

Abstract

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of seven long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.7 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 and NECAT v20200119 were the most likely to produce clean contig circularisation. Raven v0.0.8 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.4.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Details

ISSN :
20461402
Volume :
8
Database :
F1000Research
Journal :
F1000Research
Notes :
Updated Changes from Version 1 This version contains updated results for new versions of Flye (v2.7), Raven (v0.0.8) and Shasta (v0.4.0), and it adds a new assembler (NECAT v20200119) to the comparison. It also contains various small improvements made in response to the peer reviews., , [version 2; peer review: 4 approved]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.21782.2
Document Type :
research-article
Full Text :
https://doi.org/10.12688/f1000research.21782.2