1. High dimensional predictions of suicide risk in 4.2 million US Veterans using ensemble transfer learning
- Author
-
Dhaubhadel, Sayera, Ganguly, Kumkum, Ribeiro, Ruy M, Cohn, Judith D, Hyman, James M, Hengartner, Nicolas W, Kolade, Beauty, Singley, Anna, Bhattacharya, Tanmoy, Finley, Patrick, Levin, Drew, Thelen, Haedi, Cho, Kelly, Costa, Lauren, Ho, Yuk-Lam, Justice, Amy C, Pestian, John, Santel, Daniel, Zamora-Resendiz, Rafael, Crivelli, Silvia, Tamang, Suzanne, Martins, Susana, Trafton, Jodie, Oslin, David W, Beckham, Jean C, Kimbrel, Nathan A, and McMahon, Benjamin H
- Subjects
Clinical and Health Psychology ,Health Sciences ,Psychology ,Behavioral and Social Science ,Suicide ,Mental Health ,Mental health ,Good Health and Well Being ,Humans ,Veterans ,Retrospective Studies ,Carcinoma ,Renal Cell ,Cross-Sectional Studies ,Prospective Studies ,Suicide ,Attempted ,Kidney Neoplasms ,Machine Learning ,Million Veteran Program Suicide Exemplar Work Group - Abstract
We present an ensemble transfer learning method to predict suicide from Veterans Affairs (VA) electronic medical records (EMR). A diverse set of base models was trained to predict a binary outcome constructed from reported suicide, suicide attempt, and overdose diagnoses with varying choices of study design and prediction methodology. Each model used twenty cross-sectional and 190 longitudinal variables observed in eight time intervals covering 7.5 years prior to the time of prediction. Ensembles of seven base models were created and fine-tuned with ten variables expected to change with study design and outcome definition in order to predict suicide and combined outcome in a prospective cohort. The ensemble models achieved c-statistics of 0.73 on 2-year suicide risk and 0.83 on the combined outcome when predicting on a prospective cohort of [Formula: see text] 4.2 M veterans. The ensembles rely on nonlinear base models trained using a matched retrospective nested case-control (Rcc) study cohort and show good calibration across a diversity of subgroups, including risk strata, age, sex, race, and level of healthcare utilization. In addition, a linear Rcc base model provided a rich set of biological predictors, including indicators of suicide, substance use disorder, mental health diagnoses and treatments, hypoxia and vascular damage, and demographics.
- Published
- 2024