Back to Search
Start Over
Estimation of duplication history under a stochastic model for tandem repeats
- Source :
- BMC Bioinformatics, Vol 20, Iss 1, Pp 1-11 (2019), BMC Bioinformatics
- Publication Year :
- 2019
- Publisher :
- Springer Science and Business Media LLC, 2019.
-
Abstract
- Background Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have led to their formation. The ability to extract this information can enhance our understanding of evolutionary mechanisms. Results We present a stochastic model for the formation of tandem repeats via tandem duplication and substitution mutations. Based on the analysis of this model, we develop a method for estimating the relative mutation rates of duplications and substitutions, as well as the total number of mutations, in the history of a tandem repeat sequence. We validate our estimation method via Monte Carlo simulation and show that it outperforms the state-of-the-art algorithm for discovering the duplication history. We also apply our method to tandem repeat sequences in the human genome, where it demonstrates the different behaviors of micro- and mini-satellites and can be used to compare mutation rates across chromosomes. It is observed that chromosomes that exhibit the highest mutation activity in tandem repeat regions are the same as those thought to have the highest overall mutation rates. However, unlike previous works that rely on comparing human and chimpanzee genomes to measure mutation rates, the proposed method allows us to find chromosomes with the highest mutation activity based on a single genome, in essence by comparing (approximate) copies of the pattern in tandem repeats. Conclusion The prevalence of tandem repeats in most organisms and the efficiency of the proposed method enable studying various aspects of the formation of tandem repeats and the surrounding sequences in a wide range of settings. Availability The implementation of the estimation method is available at http://ips.lab.virginia.edu/smtr. Electronic supplementary material The online version of this article (10.1186/s12859-019-2603-1) contains supplementary material, which is available to authorized users.
- Subjects :
- Mutation rate
Stochastic approximation
Computational biology
Biology
lcsh:Computer applications to medicine. Medical informatics
Biochemistry
Genome
03 medical and health sciences
0302 clinical medicine
Mutation Rate
Tandem repeat
Structural Biology
Gene Duplication
Gene duplication
Humans
Computer Simulation
Duplication history
lcsh:QH301-705.5
Molecular Biology
030304 developmental biology
Chromosomes, Human, X
Stochastic Processes
0303 health sciences
Genome, Human
Applied Mathematics
Tandem repeats
Computer Science Applications
lcsh:Biology (General)
Tandem Repeat Sequences
030220 oncology & carcinogenesis
Mutation
Mutation (genetic algorithm)
lcsh:R858-859.7
Human genome
Tandem exon duplication
DNA microarray
Monte Carlo Method
Estimation
Algorithms
Research Article
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 20
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....e5bbb0f457e62bcebe1b6f2ca0812699
- Full Text :
- https://doi.org/10.1186/s12859-019-2603-1