Start Over

Cohort design and natural language processing to reduce bias in electronic health records research

Authors :: Shaan Khurshid
Christopher Reeder
Lia X. Harrington
Pulkit Singh
Gopal Sarma
Samuel F. Friedman
Paolo Di Achille
Nathaniel Diamant
Jonathan W. Cunningham
Ashby C. Turner
Emily S. Lau
Julian S. Haimovich
Mostafa A. Al-Alusi
Xin Wang
Marcus D. R. Klarqvist
Jeffrey M. Ashburner
Christian Diedrich
Mercedeh Ghadessi
Johanna Mielke
Hanna M. Eilken
Alice McElhinney
Andrea Derix
Steven J. Atlas
Patrick T. Ellinor
Anthony A. Philippakis
Christopher D. Anderson
Jennifer E. Ho
Puneet Batra
Steven A. Lubitz
Source :: NPJ digital medicine. 5(1)
Publication Year :: 2021
Abstract: Electronic health record (EHR) datasets are statistically powerful but are subject to ascertainment bias and missingness. Using the Mass General Brigham multi-institutional EHR, we approximated a community-based cohort by sampling patients receiving longitudinal primary care between 2001-2018 (Community Care Cohort Project [C3PO], n = 520,868). We utilized natural language processing (NLP) to recover vital signs from unstructured notes. We assessed the validity of C3PO by deploying established risk models for myocardial infarction/stroke and atrial fibrillation. We then compared C3PO to Convenience Samples including all individuals from the same EHR with complete data, but without a longitudinal primary care requirement. NLP reduced the missingness of vital signs by 31%. NLP-recovered vital signs were highly correlated with values derived from structured fields (Pearson r range 0.95–0.99). Atrial fibrillation and myocardial infarction/stroke incidence were lower and risk models were better calibrated in C3PO as opposed to the Convenience Samples (calibration error range for myocardial infarction/stroke: 0.012–0.030 in C3PO vs. 0.028–0.046 in Convenience Samples; calibration error for atrial fibrillation 0.028 in C3PO vs. 0.036 in Convenience Samples). Sampling patients receiving regular primary care and using NLP to recover missing data may reduce bias and maximize generalizability of EHR research.

Subjects :: Health Information Management
Medicine (miscellaneous)
Health Informatics
Computer Science Applications

Details

ISSN :: 23986352
Volume :: 5
Issue :: 1
Database :: OpenAIRE
Journal :: NPJ digital medicine
Accession number :: edsair.doi.dedup.....a195990a37e80d9154bdd0302f902099

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Cohort design and natural language processing to reduce bias in electronic health records research

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Cohort design and natural language processing to reduce bias in electronic health records research

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources