Back to Search
Start Over
RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly
- Source :
- Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2017, ⟨10.1093/bioinformatics/btx771⟩, Bioinformatics, 2017, ⟨10.1093/bioinformatics/btx771⟩
- Publication Year :
- 2017
- Publisher :
- HAL CCSD, 2017.
-
Abstract
- Motivation The haploid mammalian Y chromosome is usually under-represented in genome assemblies due to high repeat content and low depth due to its haploid nature. One strategy to ameliorate the low coverage of Y sequences is to experimentally enrich Y-specific material before assembly. As the enrichment process is imperfect, algorithms are needed to identify putative Y-specific reads prior to downstream assembly. A strategy that uses k-mer abundances to identify such reads was used to assemble the gorilla Y. However, the strategy required the manual setting of key parameters, a time-consuming process leading to sub-optimal assemblies. Results We develop a method, RecoverY, that selects Y-specific reads by automatically choosing the abundance level at which a k-mer is deemed to originate from the Y. This algorithm uses prior knowledge about the Y chromosome of a related species or known Y transcript sequences. We evaluate RecoverY on both simulated and real data, for human and gorilla, and investigate its robustness to important parameters. We show that RecoverY leads to a vastly superior assembly compared to alternate strategies of filtering the reads or contigs. Compared to the preliminary strategy used by Tomaszkiewicz et al., we achieve a 33% improvement in assembly size and a 20% improvement in the NG50, demonstrating the power of automatic parameter selection. Availability and implementation Our tool RecoverY is freely available at https://github.com/makovalab-psu/RecoverY. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- 0106 biological sciences
0301 basic medicine
Statistics and Probability
Male
Computer science
Sequence analysis
Computational biology
Y chromosome
01 natural sciences
Biochemistry
Genome
03 medical and health sciences
Y Chromosome
Animals
Humans
Molecular Biology
ComputingMilieux_MISCELLANEOUS
Mammals
Gorilla gorilla
Contig
Chromosome
Robustness (evolution)
High-Throughput Nucleotide Sequencing
Genomics
Sequence Analysis, DNA
Original Papers
Chromosomes, Mammalian
Computer Science Applications
Computational Mathematics
030104 developmental biology
Computational Theory and Mathematics
k-mer
Read Classification
Ploidy
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
Algorithms
Software
010606 plant biology & botany
Subjects
Details
- Language :
- English
- ISSN :
- 13674803, 13674811, and 14602059
- Database :
- OpenAIRE
- Journal :
- Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2017, ⟨10.1093/bioinformatics/btx771⟩, Bioinformatics, 2017, ⟨10.1093/bioinformatics/btx771⟩
- Accession number :
- edsair.doi.dedup.....3ca23ef25644c4274bc099cd6912b032