Back to Search
Start Over
A Cross-Level Information Transmission Network for Hierarchical Omics Data Integration and Phenotype Prediction from a New Genotype
- Source :
- Bioinformatics
- Publication Year :
- 2021
-
Abstract
- Motivation An unsolved fundamental problem in biology is to predict phenotypes from a new genotype under environmental perturbations. The emergence of multiple omics data provides new opportunities but imposes great challenges in the predictive modeling of genotype-phenotype associations. Firstly, the high-dimensionality of genomics data and the lack of coherent labeled data often make the existing supervised learning techniques less successful. Secondly, it is challenging to integrate heterogeneous omics data from different resources. Finally, few works have explicitly modeled the information transmission from DNA to phenotype, which involves multiple intermediate molecular types. Higher-level features (e.g. gene expression) usually have stronger discriminative and interpretable power than lower-level features (e.g. somatic mutation). Results We propose a novel Cross-LEvel Information Transmission (CLEIT) network framework to address the above issues. CLEIT aims to represent the asymmetrical multi-level organization of the biological system by integrating multiple incoherent omics data and to improve the prediction power of low-level features. CLEIT first learns the latent representation of the high-level domain then uses it as ground-truth embedding to improve the representation learning of the low-level domain in the form of contrastive loss. Besides, CLEIT can leverage the unlabeled heterogeneous omics data to improve the generalizability of the predictive model. We demonstrate the effectiveness and significant performance boost of CLEIT in predicting anti-cancer drug sensitivity from somatic mutations via the assistance of gene expressions when compared with state-of-the-art methods. CLEIT provides a general framework to model information transmissions and integrate multi-modal data in a multi-level system. Availabilityand implementation The source code is freely available at https://github.com/XieResearchGroup/CLEIT. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Source code
AcademicSubjects/SCI01060
media_common.quotation_subject
Genomics
Machine learning
computer.software_genre
Biochemistry
Domain (software engineering)
Discriminative model
Leverage (statistics)
Representation (mathematics)
Molecular Biology
media_common
business.industry
Supervised learning
Original Papers
Computer Science Applications
Computational Mathematics
ComputingMethodologies_PATTERNRECOGNITION
Computational Theory and Mathematics
Artificial intelligence
Data and Text Mining
business
computer
Feature learning
Subjects
Details
- ISSN :
- 13674811
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....84c9a28561f63af6cdb201652dd955f6