Back to Search Start Over

Sensitive Detection of Site-wise Convergent Evolution in Large Protein Alignments with ConDor

Authors :
Morel, Marie
Lemoine, Frédéric
Gascuel, Olivier
Bioinformatique évolutive - Evolutionary Bioinformatics
Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS)
Université de Paris (UP)
Hub Bioinformatique et Biostatistique - Bioinformatics and Biostatistics HUB
Institut de Systématique, Evolution, Biodiversité (ISYEB )
Muséum national d'Histoire naturelle (MNHN)-École pratique des hautes études (EPHE)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université des Antilles (UA)
This work was supported by INCEPTION program (Convention ANR-16-CONV-0005
MM PhD grant) and by PRAIRIE program (Convention ANR-19-P3IA-0001
OG).
ANR-16-CONV-0005,INCEPTION,Institut Convergences pour l'étude de l'Emergence des Pathologies au Travers des Individus et des populatiONs(2016)
ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019)
Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
Université Paris Cité (UPCité)
Institut Pasteur [Paris] (IP)-Université Paris Cité (UPCité)
Muséum national d'Histoire naturelle (MNHN)-École Pratique des Hautes Études (EPHE)
Publication Year :
2021
Publisher :
HAL CCSD, 2021.

Abstract

Evolutionary convergences are observed at all levels, from phenotype to DNA and protein sequences, and the changes observed at these different levels tend to be strongly correlated. Here we propose a simulation-based method to detect positions under convergent evolution in large protein alignments, without prior knowledge on the phenotype and environmental constraints. A phylogeny is inferred from the data and used in simulations to estimate the expected number of amino-acid changes in stable evolutionary constraints (null model) for each position. Similarly, we count the number of mutations towards the same amino acid in the data and test if they are occurring more often than expected. We applied our method to two real datasets: HIV reverse transcriptase and fish rhodopsin, and to HIVlike simulated data. On the latter, with known convergent events and substitution model, we detected on average two third of these events, with a low fraction of false positives. With HIV data, one knows that drug resistance mutations (DRMs) are convergent. Even without any knowledge of patient treatment status, we retrieved more than 70% of positions corresponding to known DRMs. On the rhodopsin dataset, four substitutions are supposed to be convergent, as they change the maximum wavelength absorption of the photoreceptor and occurred several times independently during evolution. We detected three of them. These results demonstrate the potential of the method to target specific mutations to be further studied experimentally or, for example, using a nonsynonymous/synonymous rate ratio approach. Our software named ConDor is available at http://condor.pasteur.cloud.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.dedup.wf.001..384d1a93fbb39a7d4035f539670a4a6c