1. Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
- Author
-
Clémentine Decamps, Florian Privé, Raphael Bacher, Daniel Jost, Arthur Waguet, HADACA consortium, Eugene Andres Houseman, Eugene Lurie, Pavlo Lutsik, Aleksandar Milosavljevic, Michael Scherer, Michael G. B. Blum, Magali Richard, Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques et Applications Grenoble - UMR 5525 (TIMC-IMAG), VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Translational Innovation in Medicine and Complexity / Recherche Translationnelle et Innovation en Médecine et Complexité - UMR 5525 (TIMC ), and Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,Pipeline (computing) ,Initialization ,Feature selection ,Computational biology ,Deconvolution ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Neoplasms ,Humans ,Cell heterogeneity ,Computer Simulation ,Epigenetics ,0101 mathematics ,Molecular Biology ,lcsh:QH301-705.5 ,Selection (genetic algorithm) ,ComputingMilieux_MISCELLANEOUS ,030304 developmental biology ,0303 health sciences ,DNA methylation ,Applied Mathematics ,Methodology Article ,Matrix factorization ,Computational Biology ,Methylation ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Computer Science Applications ,lcsh:Biology (General) ,Benchmark (computing) ,lcsh:R858-859.7 ,DNA microarray ,R package/pipeline ,030217 neurology & neurosurgery ,Algorithms ,Software - Abstract
Background Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. Results Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30–35%, and that selection of cell-type informative probes has similar effect. We show that Cattell’s rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms’ performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. Conclusion Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.
- Published
- 2020
- Full Text
- View/download PDF