Back to Search
Start Over
Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing
- Source :
- Genome biology, vol 22, iss 1, Genome Biology, Genome Biology, Vol 22, Iss 1, Pp 1-6 (2021)
- Publication Year :
- 2020
- Publisher :
- Cold Spring Harbor Laboratory, 2020.
-
Abstract
- Nanopore technology allows for direct sequencing of individual DNA duplexes. However, its higher error rate compared to other sequencing methods has limited its application in situations where deep coverage is unavailable, such as detection of rare variants or characterization of highly polymorphic samples. In principle, 2X coverage is available even for single duplexes, using Oxford Nanopore Technologies’ 1D2protocol or related methods which sequence both strands of the duplex consecutively. Using both strands should improve accuracy; however, most neural network basecaller architectures are designed to operate on single strands. We have developed a general approach for improving accuracy of 1D2and related protocols by finding the consensus of two neural network basecallers, by combining a constrained profile-profile alignment with a heuristic variant of beam search. When run on a basecalling neural network we trained, our consensus algorithm improves median basecall accuracy from 86.2% (for single-read decoding) to 92.1% (for pair decoding). Our software can readily be adapted to work with the output of other basecallers, such as the recently released Bonito basecaller. Although Bonito operates only on individual strands and was not designed to leverage the 1D2protocol, our method lifts its median accuracy from 93.3% to 97.7%, more than halving the median error rate. This surpasses the maximum accuracy achievable with Guppy, an alternate basecaller which was designed to include pair decoding of 1D2reads. Our software PoreOver, including both our neural network basecaller and our consensus pair decoder (which can be separably applied to improve other basecallers), is implemented in Python 3 and C++11 and is freely available athttps://github.com/jordisr/poreover.
- Subjects :
- Consensus
lcsh:QH426-470
Neural Networks
Bioinformatics
Computer science
0206 medical engineering
Short Report
Bioengineering
Computational biology
02 engineering and technology
Biology
Nanopores
Computer
03 medical and health sciences
chemistry.chemical_compound
0302 clinical medicine
Software
Information and Computing Sciences
Humans
Nanotechnology
lcsh:QH301-705.5
030304 developmental biology
0303 health sciences
Artificial neural network
business.industry
Biological Sciences
lcsh:Genetics
Nanopore Sequencing
Nanopore
lcsh:Biology (General)
chemistry
030220 oncology & carcinogenesis
Beam search
Neural Networks, Computer
Nanopore sequencing
business
Algorithm
Environmental Sciences
020602 bioinformatics
Decoding methods
DNA
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Genome biology, vol 22, iss 1, Genome Biology, Genome Biology, Vol 22, Iss 1, Pp 1-6 (2021)
- Accession number :
- edsair.doi.dedup.....b29236aa32b413f6ae4927e9d5578d0e
- Full Text :
- https://doi.org/10.1101/2020.02.25.956771