1. A fast method for calculating reliable event supports in tree reconciliations via Pareto optimality
- Author
-
Celine Scornavacca, Thu-Hien To, Edwin Jacox, Vincent Ranwez, Institut des Sciences de l'Evolution de Montpellier (UMR ISEM), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS), Institut de Recherche pour le Développement (IRD [France-Ouest]), École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL), Institut de Biologie Computationnelle (IBC), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Amélioration génétique et adaptation des plantes méditerranéennes et tropicales (UMR AGAP), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de la Recherche Agronomique (INRA)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), French Agence Nationale de la Recherche Investissements d'Avenir/Bioinformatique : ANR-10-BINF-01-02, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), École pratique des hautes études (EPHE), Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro)-Institut National de la Recherche Agronomique (INRA)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre National de la Recherche Scientifique (CNRS)-Institut de recherche pour le développement [IRD] : UR226, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), and Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
- Subjects
[SDV.SA]Life Sciences [q-bio]/Agricultural sciences ,programme informatique ,multi-objective particle swarm optimization (mopso) ,Supports ,Reliability (computer networking) ,0206 medical engineering ,02 engineering and technology ,Biology ,computer.software_genre ,Gene evolution ,Biochemistry ,Set (abstract data type) ,évolution génétique ,Evolution, Molecular ,03 medical and health sciences ,Structural Biology ,computer program ,Tree reconciliation ,Proteobacteria ,Computer Simulation ,bioinformatique ,Molecular Biology ,Phylogeny ,030304 developmental biology ,Event (probability theory) ,0303 health sciences ,arbre ,Applied Mathematics ,Pareto principle ,Reproducibility of Results ,tree reconciliation ,gene evolution ,phylogenetics ,parsimony ,supports ,Agricultural sciences ,Computer Science Applications ,Phylogenetics ,Range (mathematics) ,Tree (data structure) ,Genes, Bacterial ,A priori and a posteriori ,Graph (abstract data type) ,Data mining ,computer ,Algorithm ,Parsimony ,020602 bioinformatics ,Sciences agricoles ,algorithme ,Algorithms ,génétique appliquée ,Research Article - Abstract
Background Given a gene and a species tree, reconciliation methods attempt to retrieve the macro-evolutionary events that best explain the discrepancies between the two tree topologies. The DTL parsimonious approach searches for a most parsimonious reconciliation between a gene tree and a (dated) species tree, considering four possible macro-evolutionary events (speciation, duplication, transfer, and loss) with specific costs. Unfortunately, many events are erroneously predicted due to errors in the input trees, inappropriate input cost values or because of the existence of several equally parsimonious scenarios. It is thus crucial to provide a measure of the reliability for predicted events. It has been recently proposed that the reliability of an event can be estimated via its frequency in the set of most parsimonious reconciliations obtained using a variety of reasonable input cost vectors. To compute such a support, a straightforward but time-consuming approach is to generate the costs slightly departing from the original ones, independently compute the set of all most parsimonious reconciliations for each vector, and combine these sets a posteriori. Another proposed approach uses Pareto-optimality to partition cost values into regions which induce reconciliations with the same number of DTL events. The support of an event is then defined as its frequency in the set of regions. However, often, the number of regions is not large enough to provide reliable supports. Results We present here a method to compute efficiently event supports via a polynomial-sized graph, which can represent all reconciliations for several different costs. Moreover, two methods are proposed to take into account alternative input costs: either explicitly providing an input cost range or allowing a tolerance for the over cost of a reconciliation. Our methods are faster than the region based method, substantially faster than the sampling-costs approach, and have a higher event-prediction accuracy on simulated data. Conclusions We propose a new approach to improve the accuracy of event supports for parsimonious reconciliation methods to account for uncertainty in the input costs. Furthermore, because of their speed, our methods can be used on large gene families. Our algorithms are implemented in the ecceTERA program, freely available from http://mbb.univ-montp2.fr/MBB/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0803-x) contains supplementary material, which is available to authorized users.
- Full Text
- View/download PDF