1. How to Apply Variable Selection Machine Learning Algorithms With Multiply Imputed Data: A Missing Discussion
- Author
-
Gunn, Heather J, Rezvan, Panteha Hayati, Fernández, M Isabel, and Comulada, W Scott
- Subjects
Applied and Developmental Psychology ,Social and Personality Psychology ,Mathematical Sciences ,Statistics ,Psychology ,Generic health relevance ,Good Health and Well Being ,Humans ,Data Interpretation ,Statistical ,Algorithms ,Linear Models ,Research Design ,Bias ,LASSO ,missing data ,multiple imputation ,regularization ,regression ,Cognitive Sciences ,Social Sciences Methods ,Applied and developmental psychology ,Social and personality psychology - Abstract
Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest, but challenges emerge with incomplete data and growing numbers of candidate predictors. Regularization methods like the LASSO can reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Using listwise deletion or an ad hoc imputation strategy to deal with missing data when using regularization methods can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this tutorial, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data and illustrate how to implement these approaches in practice with an applied example. We discuss implications of each approach and describe additional research that would help solidify recommendations for best practices. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
- Published
- 2023