Back to Search
Start Over
GECKO is a genetic algorithm to classify and explore high throughput sequencing data
- Source :
- Communications Biology, Communications Biology, 2019, 2 (1), pp.1-8. ⟨10.1038/s42003-019-0456-9⟩, Communications Biology, Nature Publishing Group, 2019, 2 (1), pp.1-8. ⟨10.1038/s42003-019-0456-9⟩, Communications Biology (2), 1-8. (2019), Communications Biology, Nature Publishing Group, 2019, 2 (1), ⟨10.1038/s42003-019-0456-9⟩, Communications Biology, Vol 2, Iss 1, Pp 1-8 (2019)
- Publication Year :
- 2019
- Publisher :
- HAL CCSD, 2019.
-
Abstract
- Comparative analysis of high throughput sequencing data between multiple conditions often involves mapping of sequencing reads to a reference and downstream bioinformatics analyses. Both of these steps may introduce heavy bias and potential data loss. This is especially true in studies where patient transcriptomes or genomes may vary from their references, such as in cancer. Here we describe a novel approach and associated software that makes use of advances in genetic algorithms and feature selection to comprehensively explore massive volumes of sequencing data to classify and discover new sequences of interest without a mapping step and without intensive use of specialized bioinformatics pipelines. We demonstrate that our approach called GECKO for GEnetic Classification using k-mer Optimization is effective at classifying and extracting meaningful sequences from multiple types of sequencing approaches including mRNA, microRNA, and DNA methylome data.<br />Aubin Thomas, Sylvain Barriere et al. present a computational method for classifying and extracting meaningful sequences from high-throughput sequencing data. The method, called GECKO, uses k-mer counts that are able to classify the input data with high accuracy.
- Subjects :
- Computer science
Medicine (miscellaneous)
Genome
F30 - Génétique et amélioration des plantes
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
0302 clinical medicine
Software
Séquence d'ADN
apprentissage machine
lcsh:QH301-705.5
0303 health sciences
U10 - Informatique, mathématiques et statistiques
High-Throughput Nucleotide Sequencing
030220 oncology & carcinogenesis
Bio-informatique
General Agricultural and Biological Sciences
Algorithms
Data Structures and Algorithms
Bioinformatics
Sequencing data
Algorithme et structure de données
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
Predictive medicine
Breast Neoplasms
Feature selection
Computational biology
Data loss
Article
General Biochemistry, Genetics and Molecular Biology
DNA sequencing
03 medical and health sciences
Bioinformatique
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Machine learning
Genetic algorithm
Humans
RNA, Messenger
030304 developmental biology
Blood Cells
business.industry
Computational Biology
DNA Methylation
MicroRNAs
lcsh:Biology (General)
Sciences médicales
Mutation
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
business
Subjects
Details
- Language :
- English
- ISSN :
- 23993642
- Database :
- OpenAIRE
- Journal :
- Communications Biology, Communications Biology, 2019, 2 (1), pp.1-8. ⟨10.1038/s42003-019-0456-9⟩, Communications Biology, Nature Publishing Group, 2019, 2 (1), pp.1-8. ⟨10.1038/s42003-019-0456-9⟩, Communications Biology (2), 1-8. (2019), Communications Biology, Nature Publishing Group, 2019, 2 (1), ⟨10.1038/s42003-019-0456-9⟩, Communications Biology, Vol 2, Iss 1, Pp 1-8 (2019)
- Accession number :
- edsair.doi.dedup.....ac9f2e2f35cec24f2cd87e931f35179e
- Full Text :
- https://doi.org/10.1038/s42003-019-0456-9⟩