Back to Search Start Over

Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline.

Authors :
Tavabi N
Pruneski J
Golchin S
Singh M
Sanborn R
Heyworth B
Landschaft A
Kimia A
Kiapour A
Source :
Artificial intelligence in medicine [Artif Intell Med] 2024 May; Vol. 151, pp. 102847. Date of Electronic Publication: 2024 Mar 22.
Publication Year :
2024

Abstract

Building clinical registries is an important step in clinical research and improvement of patient care quality. Natural Language Processing (NLP) methods have shown promising results in extracting valuable information from unstructured clinical notes. However, the structure and nature of clinical notes are very different from regular text that state-of-the-art NLP models are trained and tested on, and they have their own set of challenges. In this study, we propose Sentence Extractor with Keywords (SE-K), an efficient and interpretable classification approach for extracting information from clinical notes and show that it outperforms more computationally expensive methods in text classification. Following the Institutional Review Board (IRB) approval, we used SE-K and two embedding based NLP approaches (Sentence Extractor with Embeddings (SE-E) and Bidirectional Encoder Representations from Transformers (BERT)) to develop comprehensive registry of anterior cruciate ligament surgeries from 20 years of unstructured clinical data at a multi-site tertiary-care regional children's hospital. The low-resource approach (SE-K) had better performance (average AUROC of 0.94 ± 0.04) than the embedding-based approaches (SE-E: 0.93 ± 0.04 and BERT: 0.87 ± 0.09) for out of sample validation, in addition to minimum performance drop between test and out-of-sample validation. Moreover, the SE-K approach was at least six times faster (on CPU) than SE-E (on CPU) and BERT (on GPU) and provides interpretability. Our proposed approach, SE-K, can be effectively used to extract relevant variables from clinic notes to build large-scale registries, with consistently better performance compared to the more resource-intensive approaches (e.g., BERT). Such approaches can facilitate information extraction from unstructured notes for registry building, quality improvement and adverse event monitoring.<br />Competing Interests: Declaration of competing interest No conflicts of interests relevant to the topic of this study.<br /> (Copyright © 2024 Elsevier B.V. All rights reserved.)

Details

Language :
English
ISSN :
1873-2860
Volume :
151
Database :
MEDLINE
Journal :
Artificial intelligence in medicine
Publication Type :
Academic Journal
Accession number :
38658131
Full Text :
https://doi.org/10.1016/j.artmed.2024.102847