Back to Search
Start Over
MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes
- Source :
- Nucleic Acids Research
- Publication Year :
- 2009
- Publisher :
- Oxford University Press (OUP), 2009.
-
Abstract
- Computational methods for genome-wide identification of mobile genetic elements (MGEs) have become increasingly necessary for both genome annotation and evolutionary studies. Non-long terminal repeat (non-LTR) retrotransposons are a class of MGEs that have been found in most eukaryotic genomes, sometimes in extremely high numbers. In this article, we present a computational tool, MGEScan-non-LTR, for the identification of non-LTR retrotransposons in genomic sequences, following a computational approach inspired by a generalized hidden Markov model (GHMM). Three different states represent two different protein domains and inter-domain linker regions encoded in the non-LTR retrotransposons, and their scores are evaluated by using profile hidden Markov models (for protein domains) and Gaussian Bayes classifiers (for linker regions), respectively. In order to classify the non-LTR retrotransposons into one of the 12 previously characterized clades using the same model, we defined separate states for different clades. MGEScan-non-LTR was tested on the genome sequences of four eukaryotic organisms, Drosophila melanogaster, Daphnia pulex, Ciona intestinalis and Strongylocentrotus purpuratus. For the D. melanogaster genome, MGEScan-non-LTR found all known 'full-length' elements and simultaneously classified them into the clades CR1, I, Jockey, LOA and R1. Notably, for the D. pulex genome, in which no non-LTR retrotransposon has been annotated, MGEScan-non-LTR found a significantly larger number of elements than did RepeatMasker, using the current version of the RepBase Update library. We also identified novel elements in the other two genomes, which have only been partially studied for non-LTR retrotransposons.
- Subjects :
- Retroelements
Molecular Sequence Data
Protein domain
Retrotransposon
Computational biology
Genome
Phylogenetics
Genetics
Animals
Amino Acid Sequence
Strongylocentrotus purpuratus
Phylogeny
Sequence Homology, Amino Acid
biology
Genomics
Genome project
biology.organism_classification
Markov Chains
Ciona intestinalis
Protein Structure, Tertiary
Drosophila melanogaster
Daphnia
Eukaryotic chromosome fine structure
Methods Online
Mobile genetic elements
Subjects
Details
- ISSN :
- 13624962 and 03051048
- Volume :
- 37
- Database :
- OpenAIRE
- Journal :
- Nucleic Acids Research
- Accession number :
- edsair.doi.dedup.....7d6f172ec5fa3d0736a2e1985a1b186a
- Full Text :
- https://doi.org/10.1093/nar/gkp752