Back to Search
Start Over
Comparing the Statistical Fate of Paralogous and Orthologous Sequences
- Source :
- Genetics, Genetics, Genetics Society of America, 2016, 204 (2), pp.475-482. ⟨10.1534/genetics.116.193912⟩, Genetics, 2016, 204 (2), pp.475-482. ⟨10.1534/genetics.116.193912⟩
- Publication Year :
- 2016
- Publisher :
- HAL CCSD, 2016.
-
Abstract
- For several decades, sequence alignment has been a widely used tool in bioinformatics. For instance, finding homologous sequences with a known function in large databases is used to get insight into the function of nonannotated genomic regions. Very efficient tools like BLAST have been developed to identify and rank possible homologous sequences. To estimate the significance of the homology, the ranking of alignment scores takes a background model for random sequences into account. Using this model we can estimate the probability to find two exactly matching subsequences by chance in two unrelated sequences. For two homologous sequences, the corresponding probability is much higher, which allows us to identify them. Here we focus on the distribution of lengths of exact sequence matches between protein-coding regions of pairs of evolutionarily distant genomes. We show that this distribution exhibits a power-law tail with an exponent α=−5. Developing a simple model of sequence evolution by substitutions and segmental duplications, we show analytically and computationally that paralogous and orthologous gene pairs contribute differently to this distribution. Our model explains the differences observed in the comparison of coding and noncoding parts of genomes, thus providing a better understanding of statistical properties of genomic sequences and their evolution.
- Subjects :
- 0301 basic medicine
Genome evolution
statistical genomics
Sequence analysis
[SDV]Life Sciences [q-bio]
Sequence Homology
Genomics
Sequence alignment
Computational biology
comparative genomics
Biology
Investigations
genome evolution
01 natural sciences
Genome
Homologous Sequences
Homology (biology)
Evolution, Molecular
03 medical and health sciences
Segmental Duplications, Genomic
Genetics
0101 mathematics
DNA duplications
Probability
030304 developmental biology
Mathematics
Segmental duplication
Comparative genomics
0303 health sciences
Exact sequence
Models, Genetic
010102 general mathematics
Computational Biology
030104 developmental biology
Exponent
Sequence Alignment
Orthologous Gene
Subjects
Details
- Language :
- English
- ISSN :
- 00166731
- Database :
- OpenAIRE
- Journal :
- Genetics, Genetics, Genetics Society of America, 2016, 204 (2), pp.475-482. ⟨10.1534/genetics.116.193912⟩, Genetics, 2016, 204 (2), pp.475-482. ⟨10.1534/genetics.116.193912⟩
- Accession number :
- edsair.doi.dedup.....ab92ab213e5dafd1be3a3fbe7fac5d32