Back to Search
Start Over
Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences
- Source :
- Data in Brief, Data in Brief, Elsevier, 2019, 25, pp.104212. ⟨10.1016/j.dib.2019.104212⟩, Data in Brief, Vol 25, Iss, Pp-(2019), Data in Brief, 2019, 25, pp.104212. ⟨10.1016/j.dib.2019.104212⟩
- Publication Year :
- 2019
- Publisher :
- HAL CCSD, 2019.
-
Abstract
- Estimating the number of substitution events per site that have occurred during the evolution of a pair of amino acid sequences is a common task in phylogenetics and comparative genomics that often requires quite slow maximum-likelihood procedures when taking into account explicit evolutionary models. Data presented in this article are large sets of numbers of substitution events and associated numbers of observed differences between pairs of aligned amino acid sequences that have been generated through a simulation procedure of sequence evolution under a broad range of evolutionary models. These data are available at https://zenodo.org/record/2653704 (doi:10.5281/zenodo.2653704). They are accompanied in this paper by figures showing the strong relationship between the corresponding evolutionary and uncorrected distances, as well as estimated numerical constants that determine non-linear functions that fit the simulated data. These numerical constants can be useful to quickly estimate pairwise evolutionary distances directly from uncorrected distances between aligned amino acid sequences. Keywords: Amino acid, Evolutionary model, Corrected distance, Uncorrected distance, Computer simulation, Nonlinear regression
- Subjects :
- lcsh:Computer applications to medicine. Medical informatics
[SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics, Phylogenetics and taxonomy
03 medical and health sciences
0302 clinical medicine
Phylogenetics
Agricultural and Biological Science
Nonlinear regression
Statistical physics
Corrected distance
lcsh:Science (General)
030304 developmental biology
Mathematics
Comparative genomics
chemistry.chemical_classification
0303 health sciences
Sequence
Quantitative Biology::Biomolecules
Multidisciplinary
Substitution (logic)
Computer simulation
Quantitative Biology::Genomics
Amino acid
Evolutionary model
Range (mathematics)
Uncorrected distance
chemistry
lcsh:R858-859.7
Pairwise comparison
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
030217 neurology & neurosurgery
[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
lcsh:Q1-390
Subjects
Details
- Language :
- English
- ISSN :
- 23523409
- Database :
- OpenAIRE
- Journal :
- Data in Brief, Data in Brief, Elsevier, 2019, 25, pp.104212. ⟨10.1016/j.dib.2019.104212⟩, Data in Brief, Vol 25, Iss, Pp-(2019), Data in Brief, 2019, 25, pp.104212. ⟨10.1016/j.dib.2019.104212⟩
- Accession number :
- edsair.doi.dedup.....61e37c734f56bac6f97e0fce4c64221f