Back to Search Start Over

Simulation data for the estimation of numerical constants for approximating pairwise evolutionary distances between amino acid sequences

Authors :
Alexis Criscuolo
Julien Guglielmini
Thomas Bigot
Institut Pasteur [Paris]
Département de Biologie Computationnelle - Department of Computational Biology
Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS)
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
The authors are obliged to the Bioinformatics and Biostatistics Hub of Institut Pasteur, Paris, France, for support.
This work used the computational and storage services (TARS cluster) provided by the IT department at Institut Pasteur, Paris. The authors also thank one anonymous reviewer for its fruitful comments.
Institut Pasteur [Paris] (IP)
Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
Source :
Data in Brief, Data in Brief, Elsevier, 2019, 25, pp.104212. ⟨10.1016/j.dib.2019.104212⟩, Data in Brief, Vol 25, Iss, Pp-(2019), Data in Brief, 2019, 25, pp.104212. ⟨10.1016/j.dib.2019.104212⟩
Publication Year :
2019
Publisher :
HAL CCSD, 2019.

Abstract

Estimating the number of substitution events per site that have occurred during the evolution of a pair of amino acid sequences is a common task in phylogenetics and comparative genomics that often requires quite slow maximum-likelihood procedures when taking into account explicit evolutionary models. Data presented in this article are large sets of numbers of substitution events and associated numbers of observed differences between pairs of aligned amino acid sequences that have been generated through a simulation procedure of sequence evolution under a broad range of evolutionary models. These data are available at https://zenodo.org/record/2653704 (doi:10.5281/zenodo.2653704). They are accompanied in this paper by figures showing the strong relationship between the corresponding evolutionary and uncorrected distances, as well as estimated numerical constants that determine non-linear functions that fit the simulated data. These numerical constants can be useful to quickly estimate pairwise evolutionary distances directly from uncorrected distances between aligned amino acid sequences. Keywords: Amino acid, Evolutionary model, Corrected distance, Uncorrected distance, Computer simulation, Nonlinear regression

Details

Language :
English
ISSN :
23523409
Database :
OpenAIRE
Journal :
Data in Brief, Data in Brief, Elsevier, 2019, 25, pp.104212. ⟨10.1016/j.dib.2019.104212⟩, Data in Brief, Vol 25, Iss, Pp-(2019), Data in Brief, 2019, 25, pp.104212. ⟨10.1016/j.dib.2019.104212⟩
Accession number :
edsair.doi.dedup.....61e37c734f56bac6f97e0fce4c64221f