1. Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data
- Author
-
Michael DeGiorgio and Raquel Assis
- Subjects
Computer science ,neural network ,Decision tree ,Gene Expression ,Computational biology ,Biology ,AcademicSubjects/SCI01180 ,Evolution, Molecular ,Gene expression ,Gene duplication ,Genetics ,Methods ,Ornstein–Uhlenbeck ,Animals ,Gene ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Natural selection ,neofunctionalization ,Artificial neural network ,Models, Genetic ,AcademicSubjects/SCI01130 ,gene duplication ,Phenotype ,ComputingMethodologies_PATTERNRECOGNITION ,Subfunctionalization ,Neofunctionalization ,Drosophila ,ComputingMethodologies_GENERAL ,Neural Networks, Computer ,Classifier (UML) ,subfunctionalization ,Functional divergence ,Software - Abstract
Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. To date, only one method—CDROM—has been developed with this goal in mind. In particular, CDROM employs gene expression distances as proxies for functional divergence, and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However,CDROMdoes not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the underlying parameters of duplicate gene evolution. Thus, here we developCLOUD, a multi-layer neural network built upon a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is theCLOUDclassifier substantially more powerful and accurate thanCDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of theCLOUDclassifier and predictor to empirical data fromDrosophilarecapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence,CLOUDrepresents the best available method for classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby also highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.
- Published
- 2020