Back to Search
Start Over
A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeats
- Source :
- IEEE/ACM Transactions on Computational Biology and Bioinformatics. 6:180-189
- Publication Year :
- 2009
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2009.
-
Abstract
- Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire. © 2006 IEEE.
- Subjects :
- DNA, Bacterial
Bioinformatics
Iterative method
Molecular Sequence Data
Interspersed repeat
Mycoplasma genitalium
Sequence alignment
Computational biology
Biology
Genome
Iterative refinement
Sequence Homology, Nucleic Acid
Genetics
Computer Simulation
Hidden Markov model
Models, Statistical
Multiple sequence alignment
Base Sequence
Applied Mathematics
DNA
Interspersed Repetitive Sequences
Quantitative Biology::Genomics
Markov Chains
Sequence Alignment
Genome, Bacterial
Software
Biotechnology
Subjects
Details
- ISSN :
- 15455963
- Volume :
- 6
- Database :
- OpenAIRE
- Journal :
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Accession number :
- edsair.doi.dedup.....8daf5311776a5ae1d5fe4e233e68bc56
- Full Text :
- https://doi.org/10.1109/tcbb.2009.9