Back to Search
Start Over
Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer
- Source :
- BMC Bioinformatics, BMC Bioinformatics, Vol 13, Iss 1, p 48 (2012), BMC Bioinformatics, BioMed Central, 2012, 13 (1), pp.48. ⟨10.1186/1471-2105-13-48⟩, BMC Bioinformatics, 2012, 13 (1), pp.48. ⟨10.1186/1471-2105-13-48⟩
- Publication Year :
- 2011
-
Abstract
- Background The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers) are typically employed to process such data. However, these methods require large memory resources and computation time. Many basic biological questions could be answered targeting specific information in the reads, thus avoiding complete assembly. Results We present Mapsembler, an iterative micro and targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest that can be constructed from reads and builds a short assembly around it, either as a plain sequence or as a graph, showing contextual structure. We introduce new algorithms to retrieve approximate occurrences of a sequence from reads and construct an extension graph. Among other results presented in this paper, Mapsembler enabled to retrieve previously described human breast cancer candidate fusion genes, and to detect new ones not previously known. Conclusions Mapsembler is the first software that enables de novo discovery around a region of interest of repeats, SNPs, exon skipping, gene fusion, as well as other structural events, directly from raw sequencing reads. As indexing is localized, the memory footprint of Mapsembler is negligible. Mapsembler is released under the CeCILL license and can be freely downloaded from http://alcovna.genouest.org/mapsembler/.
- Subjects :
- Sequence analysis
0206 medical engineering
Genomics
Breast Neoplasms
02 engineering and technology
Biology
computer.software_genre
lcsh:Computer applications to medicine. Medical informatics
Genome
Biochemistry
Polymorphism, Single Nucleotide
03 medical and health sciences
Software
Structural Biology
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN]
Humans
Oncogene Fusion
lcsh:QH301-705.5
Molecular Biology
030304 developmental biology
Genetics
0303 health sciences
business.industry
Computers
Applied Mathematics
Search engine indexing
High-Throughput Nucleotide Sequencing
Sequence Analysis, DNA
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Computer Science Applications
lcsh:Biology (General)
Memory footprint
Graph (abstract data type)
lcsh:R858-859.7
Female
Data mining
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
business
Transcriptome
computer
020602 bioinformatics
Algorithms
Reference genome
Research Article
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 13
- Database :
- OpenAIRE
- Journal :
- BMC bioinformatics
- Accession number :
- edsair.doi.dedup.....ea6048f01993e0bd592018dcbd8231f2