Start Over

RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly

Authors :: Kateryna D. Makova
Rayan Chikhi
Marta Tomaszkiewicz
Monika Cechova
Samarth Rangavittal
Paul Medvedev
Robert S. Harris
Pennsylvania State University (Penn State)
Penn State System
Department of Anaesthesia
St George's Hospital
Institut de Génomique Fonctionnelle de Lyon (IGFL)
École normale supérieure - Lyon (ENS Lyon)-Institut National de la Recherche Agronomique (INRA)-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS)
Dept. of Computer Science and Engineering
Penn State System-Penn State System
University of Pennsylvania [Philadelphia]
École normale supérieure de Lyon (ENS de Lyon)-Institut National de la Recherche Agronomique (INRA)-Université Claude Bernard Lyon 1 (UCBL)
University of Pennsylvania
Source :: Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2017, ⟨10.1093/bioinformatics/btx771⟩, Bioinformatics, 2017, ⟨10.1093/bioinformatics/btx771⟩
Publication Year :: 2017
Publisher :: HAL CCSD, 2017.
Abstract: Motivation The haploid mammalian Y chromosome is usually under-represented in genome assemblies due to high repeat content and low depth due to its haploid nature. One strategy to ameliorate the low coverage of Y sequences is to experimentally enrich Y-specific material before assembly. As the enrichment process is imperfect, algorithms are needed to identify putative Y-specific reads prior to downstream assembly. A strategy that uses k-mer abundances to identify such reads was used to assemble the gorilla Y. However, the strategy required the manual setting of key parameters, a time-consuming process leading to sub-optimal assemblies. Results We develop a method, RecoverY, that selects Y-specific reads by automatically choosing the abundance level at which a k-mer is deemed to originate from the Y. This algorithm uses prior knowledge about the Y chromosome of a related species or known Y transcript sequences. We evaluate RecoverY on both simulated and real data, for human and gorilla, and investigate its robustness to important parameters. We show that RecoverY leads to a vastly superior assembly compared to alternate strategies of filtering the reads or contigs. Compared to the preliminary strategy used by Tomaszkiewicz et al., we achieve a 33% improvement in assembly size and a 20% improvement in the NG50, demonstrating the power of automatic parameter selection. Availability and implementation Our tool RecoverY is freely available at https://github.com/makovalab-psu/RecoverY. Supplementary information Supplementary data are available at Bioinformatics online.

Details

Language :: English
ISSN :: 13674803, 13674811, and 14602059
Database :: OpenAIRE
Journal :: Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2017, ⟨10.1093/bioinformatics/btx771⟩, Bioinformatics, 2017, ⟨10.1093/bioinformatics/btx771⟩
Accession number :: edsair.doi.dedup.....3ca23ef25644c4274bc099cd6912b032

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources