Back to Search Start Over

Identifying the most crucial factors associated with depression based on interpretable machine learning: a case study from CHARLS.

Authors :
Rulin Li
Xueyan Wang
Lanjun Luo
Youwei Yuan
Source :
Frontiers in Psychology; 2024, p1-12, 12p
Publication Year :
2024

Abstract

Background: Depression is one of the most common mental illnesses among middle-aged and older adults in China. It is of great importance to find the crucial factors that lead to depression and to effectively control and reduce the risk of depression. Currently, there are limited methods available to accurately predict the risk of depression and identify the crucial factors that influence it. Methods: We collected data from 25,586 samples from the harmonized China Health and Retirement Longitudinal Study (CHARLS), and the latest records from 2018 were included in the current cross-sectional analysis. Ninety-three input variables in the survey were considered as potential influential features. Five machine learning (ML) models were utilized, including CatBoost and eXtreme Gradient Boosting (XGBoost), Gradient Boosting decision tree (GBDT), Random Forest (RF), Light Gradient Boosting Machine (LightGBM). The models were compared to the traditional multivariable Linear Regression (LR) model. Simultaneously, SHapley Additive exPlanations (SHAP) were used to identify key influencing factors at the global level and explain individual heterogeneity through instance-level analysis. To explore how different factors are nonlinearly associated with the risk of depression, we employed the Accumulated Local Effects (ALE) approach to analyze the identified critical variables while controlling other covariates. Results: CatBoost outperformed other machine learning models in terms of MAE, MSE, MedAE, and R2metrics. The top three crucial factors identified by the SHAP were r4satlife, r4slfmem, and r4shlta, representing life satisfaction, selfreported memory, and health status levels, respectively. Conclusion: This study demonstrates that the CatBoost model is an appropriate choice for predicting depression among middle-aged and older adults in Harmonized CHARLS. The SHAP and ALE interpretable methods have identified crucial factors and the nonlinear relationship with depression, which require the attention of domain experts. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
16641078
Database :
Complementary Index
Journal :
Frontiers in Psychology
Publication Type :
Academic Journal
Accession number :
179006904
Full Text :
https://doi.org/10.3389/fpsyg.2024.1392240