Back to Search
Start Over
Benchmarking joint multi-omics dimensionality reduction approaches for cancer study
- Publication Year :
- 2020
- Publisher :
- Cold Spring Harbor Laboratory, 2020.
-
Abstract
- High-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve this multi-omics data integration, Joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines.; We performed a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluated their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we used TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assessed their classification of multi-omics single-cell data.; From these in-depth comparisons, we observed that intNMF performs best in clustering, while MCIA offers a consistent and effective behavior across many contexts. The full code of this benchmark is implemented in a Jupyter notebook - multi-omics mix (momix) - to foster reproducibility, and support data producers, users and future developers.; High-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve this multi-omics data integration, Joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines. We performed a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluated their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we used TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assessed their classification of multi-omics single-cell data. From these in-depth comparisons, we observed that intNMF performs best in clustering, while MCIA offers a consistent and effective behavior across many contexts. The full code of this benchmark is implemented in a Jupyter notebook-multi-omics mix (momix)-to foster reproducibility, and support data producers, users and future developers.
- Subjects :
- [SDV]Life Sciences [q-bio]
Sample (statistics)
computer.software_genre
Machine learning
Single-Cell
03 medical and health sciences
0302 clinical medicine
Code (cryptography)
[MATH]Mathematics [math]
Cluster analysis
Cancer
030304 developmental biology
Multi-omics
0303 health sciences
business.industry
Dimensionality reduction
Matrix factorization
Benchmarking
[STAT]Statistics [stat]
030220 oncology & carcinogenesis
Benchmark (computing)
Data integration
Artificial intelligence
business
Joint (audio engineering)
computer
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....dc0cfc984e3e8b2b207b4e47dcc33544
- Full Text :
- https://doi.org/10.1101/2020.01.14.905760