Back to Search
Start Over
GeneMark-HM: improving gene prediction in DNA sequences of human microbiome
- Source :
- NAR Genomics and Bioinformatics
- Publication Year :
- 2021
- Publisher :
- Oxford University Press (OUP), 2021.
-
Abstract
- Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools.
- Subjects :
- AcademicSubjects/SCI01140
0303 health sciences
AcademicSubjects/SCI01060
Contig
Gene prediction
AcademicSubjects/SCI00030
Standard Article
Gene Annotation
Computational biology
Biology
AcademicSubjects/SCI01180
Genome
DNA sequencing
03 medical and health sciences
0302 clinical medicine
Metagenomics
AcademicSubjects/SCI00980
030217 neurology & neurosurgery
Selection (genetic algorithm)
GC-content
030304 developmental biology
Subjects
Details
- ISSN :
- 26319268
- Volume :
- 3
- Database :
- OpenAIRE
- Journal :
- NAR Genomics and Bioinformatics
- Accession number :
- edsair.doi.dedup.....1acd4d2d5ad7d105bf596bf668c1c66b
- Full Text :
- https://doi.org/10.1093/nargab/lqab047