Back to Search Start Over

From contigs towards chromosomes: automatic Improvement of Long Read Assemblies (ILRA)

Authors :
Ruiz, José L.
Reimering, Susanne
Sanders, Mandy
Escobar-Prieto, Juan David
Brancucci, Nicolas M. B.
Echeverry, Diego F.
Abdirahman I. Abdi
Marti, Matthias
Gómez-Díaz, Elena
Otto, Thomas D.
Publication Year :
2023
Publisher :
Zenodo, 2023.

Abstract

Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. There is a promise to obtain “perfect genomes” with long read technologies, but the number of contigs often exceeds the number of chromosomes by far, containing many insertion and deletion errors around homopolymer tracks. To overcome these issues, we implemented the ILRA pipeline to correct long read-based assemblies, so contigs are reordered, renamed, merged, circularized, or filtered if erroneous or contaminated, and Illumina reads are used to correct homopolymer errors. We successfully tested our approach to assemble the genomes of four novel Plasmodium falciparum field samples, and on existing assemblies of the human data, Trypanosoma brucei and Leptosphaeria spp. We found that correcting homopolymer tracks reduced the number of genes incorrectly annotated as pseudogenes, but an iterative correction seemed to be required to correct larger numbers of homopolymer errors. In summary, we described and compared the performance of a new software, which improved the quality of novel long read assemblies, and that can be used to correct small- and medium-sized genomes up to 1Gbp.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....b0b82e7ffd1c40d786467fbf0aa79ab7
Full Text :
https://doi.org/10.5281/zenodo.7516750