Back to Search Start Over

Robust SNP-based prediction of rheumatoid arthritis through machine-learning-optimized polygenic risk score

Authors :
Ashley J. W. Lim
C. Tera Tyniana
Lee Jin Lim
Justina Wei Lynn Tan
Ee Tzun Koh
TTSH Rheumatoid Arthritis Study Group
Samuel S. Chong
Chiea Chuen Khor
Khai Pang Leong
Caroline G. Lee
Source :
Journal of Translational Medicine, Vol 21, Iss 1, Pp 1-17 (2023)
Publication Year :
2023
Publisher :
BMC, 2023.

Abstract

Abstract Background The popular statistics-based Genome-wide association studies (GWAS) have provided deep insights into the field of complex disorder genetics. However, its clinical applicability to predict disease/trait outcomes remains unclear as statistical models are not designed to make predictions. This study employs statistics-free machine-learning (ML)-optimized polygenic risk score (PRS) to complement existing GWAS and bring the prediction of disease/trait outcomes closer to clinical application. Rheumatoid Arthritis (RA) was selected as a model disease to demonstrate the robustness of ML in disease prediction as RA is a prevalent chronic inflammatory joint disease with high mortality rates, affecting adults at the economic prime. Early identification of at-risk individuals may facilitate measures to mitigate the effects of the disease. Methods This study employs a robust ML feature selection algorithm to identify single nucleotide polymorphisms (SNPs) that can predict RA from a set of training data comprising RA patients and population control samples. Thereafter, selected SNPs were evaluated for their predictive performances across 3 independent, unseen test datasets. The selected SNPs were subsequently used to generate PRS which was also evaluated for its predictive capacity as a sole feature. Results Through robust ML feature selection, 9 SNPs were found to be the minimum number of features for excellent predictive performance (AUC > 0.9) in 3 independent, unseen test datasets. PRS based on these 9 SNPs was significantly associated with (P 0.9) of RA in the 3 unseen datasets. A RA ML-PRS calculator of these 9 SNPs was developed ( https://xistance.shinyapps.io/prs-ra/ ) to facilitate individualized clinical applicability. The majority of the predictive SNPs are protective, reside in non-coding regions, and are either predicted to be potentially functional SNPs (pfSNPs) or in high linkage disequilibrium (r2 > 0.8) with un-interrogated pfSNPs. Conclusions These findings highlight the promise of this ML strategy to identify useful genetic features that can robustly predict disease and amenable to translation for clinical application.

Details

Language :
English
ISSN :
14795876
Volume :
21
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Journal of Translational Medicine
Publication Type :
Academic Journal
Accession number :
edsdoj.87deb690648f4aca92310084051267ce
Document Type :
article
Full Text :
https://doi.org/10.1186/s12967-023-03939-5