Back to Search Start Over

GeneMark-HM: improving gene prediction in DNA sequences of human microbiome

Authors :
Alexandre Lomsadze
Christophe Bonny
Mark Borodovsky
Francesco Strozzi
Source :
NAR Genomics and Bioinformatics
Publication Year :
2021
Publisher :
Oxford University Press (OUP), 2021.

Abstract

Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools.

Details

ISSN :
26319268
Volume :
3
Database :
OpenAIRE
Journal :
NAR Genomics and Bioinformatics
Accession number :
edsair.doi.dedup.....1acd4d2d5ad7d105bf596bf668c1c66b
Full Text :
https://doi.org/10.1093/nargab/lqab047