1. Development and validation of DNA Methylation scores in two European cohorts augment 10-year risk prediction of type 2 diabetes
- Author
-
Riccardo E. Marioni, Melanie Waldenberger, Christian Gieger, Kathryn L. Evans, Natalia Szlachetka, Chloe Fawns-Ritchie, Yipeng Cheng, Karla Monterrubio-Gómez, Evgenii Lobzaev, Catalina A. Vallejos, Archie Campbell, David J. Porteous, Andrea Ganna, Andrew M. McIntosh, Timothy I. Cannings, Danni A Gadd, Daniel L. McCartney, Michael J Stam, Rosie M. Walker, Imrich Berta, Yufei Zhang, Cliff Nangle, Annette Peters, and Wolfgang Rathmann
- Subjects
Aging ,0303 health sciences ,business.industry ,Incidence (epidemiology) ,Neuroscience (miscellaneous) ,Area under the curve ,Disease ,3. Good health ,03 medical and health sciences ,0302 clinical medicine ,Lasso (statistics) ,DNA methylation ,Cohort ,Statistics ,Trait ,Medicine ,030212 general & internal medicine ,Geriatrics and Gerontology ,business ,Precision and recall ,030304 developmental biology - Abstract
Type 2 diabetes mellitus (T2D) is one of the most prevalent diseases in the world and presents a major health and economic burden, a notable proportion of which could be alleviated with improved early prediction and intervention. While standard risk factors including age, obesity, and hypertension have shown good predictive performance, we show that the use of CpG DNA methylation information leads to a significant improvement in the prediction of 10-year T2D incidence risk. Whilst previous studies have been largely constrained by linear assumptions and the use of CpGs one-at-the-time, we have adopted a more flexible approach based on a range of linear and tree-ensemble models for classification and time-to-event prediction. Using the Generation Scotland cohort (n=9,537) our best performing model (Area Under the Curve (AUC)=0.880, Precision Recall AUC (PRAUC)=0.539, McFaddens R2=0.316) used a LASSO Cox proportional-hazards predictor and showed notable improvement in onset prediction, above and beyond standard risk factors (AUC=0.860, PRAUC=0.444 R2=0.261). Replication of the main finding was observed in an external test dataset (the German-based KORA study, p=3.7x10-4). Tree-ensemble methods provided comparable performance and future improvements to these models are discussed. Finally, we introduce MethylPipeR, an R package with accompanying user interface, for systematic and reproducible development of complex trait and incident disease predictors. While MethylPipeR was applied to incident T2D prediction with DNA methylation in our experiments, the package is designed for generalised development of predictive models and is applicable to a wide range of omics data and target traits.
- Published
- 2021
- Full Text
- View/download PDF