Back to Search Start Over

Analyzing missingness patterns in real-world data using the SMDI toolkit: application to a linked EHR-claims pharmacoepidemiology study

Authors :
Sudha R. Raman
Bradley G. Hammill
Pamela A. Shaw
Hana Lee
Sengwee Toh
John G. Connolly
Kimberly J. Dandreo
Vinit Nalawade
Fang Tian
Wei Liu
Jie Li
José J. Hernández-Muñoz
Robert J. Glynn
Rishi J. Desai
Janick Weberpals
Source :
BMC Medical Research Methodology, Vol 24, Iss 1, Pp 1-14 (2024)
Publication Year :
2024
Publisher :
BMC, 2024.

Abstract

Abstract Background Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely missingness mechanisms, illustrate the decision-making process for analyses in an empirical case example. Methods We utilized the Structural Missing Data Investigations (SMDI) toolkit to characterize missing data patterns in the context of a pharmacoepidemiology study comparing cardiovascular outcomes of initiating sodium-glucose-cotransporter-2 inhibitors (SGLT2i) and dipeptidyl peptidase‐4 inhibitors (DPP‐4i) among older adults. The study used a linked EHR-Medicare claims dataset from Duke Health patients (2015–2017), focusing on partially observed confounders from EHR data (HbA1c lab and body mass index [BMI] values). Our analysis incorporated SMDI's descriptive functions and diagnostic tests to explore missingness patterns and determine missingness mitigation approaches. We used findings from these investigations to inform estimation of adjusted hazard ratios comparing the two classes of medications. Results High levels of missingness were noted for important confounding variables including HbA1c (63.6%) and BMI (16.5%). Diagnostic tests resulted in output that described: 1) the distributions of patient characteristics, exposure, and outcome between patients with or without an observed value of the partially observed covariate, 2) the ability to predict missingness based on observed covariates, and 3) estimate if the missingness of a partially observed covariate is differential with respect to the outcome. There was evidence that missingness could be sufficiently described using observed data, which allowed multiple imputation by chained equations using random forests to address missing confounder data in estimating treatment effects. Multiple imputation resulted in improved alignment of effect estimates with previous studies. Conclusions We were able to demonstrate the practical application of the SMDI toolkit in a real-world setting. Application of the SMDI toolkit and the resulting insights of potential missingness patterns can inform the choice of appropriate analytic methods and increase transparency of research methods in handling missing data. This type of approach can inform analytic decision making and may increase our ability to generate evidence from real-world data.

Details

Language :
English
ISSN :
14712288
Volume :
24
Issue :
1
Database :
Directory of Open Access Journals
Journal :
BMC Medical Research Methodology
Publication Type :
Academic Journal
Accession number :
edsdoj.4ae0b29ad6134def83c0064c37302018
Document Type :
article
Full Text :
https://doi.org/10.1186/s12874-024-02330-2