Back to Search
Start Over
Natural language processing of head CT reports to identify intracranial mass effect: CTIME algorithm
- Source :
- The American journal of emergency medicine. 51
- Publication Year :
- 2021
-
Abstract
- BACKGROUND The Mortality Probability Model (MPM) is used in research and quality improvement to adjust for severity of illness and can also inform triage decisions. However, a limitation for its automated use or application is that it includes the variable "intracranial mass effect" (IME), which requires human engagement with the electronic health record (EHR). We developed and tested a natural language processing (NLP) algorithm to identify IME from CT head reports. METHODS We obtained initial CT head reports from adult patients who were admitted to the ICU from our ED between 10/2013 and 9/2016. Each head CT head report was labeled yes/no IME by at least two of five independent labelers. The reports were then randomly divided 80/20 into training and test sets. All reports were preprocessed to remove linguistic and style variability, and a dictionary was created to map similar common terms. We tested three vectorization strategies: Term Frequency-Inverse Document frequency (TF-IDF), Word2Vec, and Universal Sentence Encoder to convert the report text to a numerical vector. This vector served as the input to a classification-tree-based ensemble machine learning algorithm (XGBoost). After training, model performance was assessed in the test set using the area under the receiver operating characteristic curve (AUROC). We also divided the continuous range of scores into positive/inconclusive/negative categories for IME. RESULTS Of the 1202 CT reports in the training set, 308 (25.6%) reports were manually labeled as "yes" for IME. Of the 355 reports in the test set, 108 (30.4%) were labeled as "yes" for IME. The TF-IDF vectorization strategy as an input for the XGBoost model had the best AUROC:-- 0.9625 (95% CI 0.9443-0.9807). TF-IDF score categories were defined and had the following likelihood ratios: "positive" (TF-IDF score > 0.5) LR = 24.59; "inconclusive" (TF-IDF 0.05-0.5) LR = 0.99; and "negative" (TF-IDF
- Subjects :
- computer.software_genre
Machine Learning
False positive paradox
Medicine
Electronic Health Records
Humans
Word2vec
Natural Language Processing
Receiver operating characteristic
business.industry
Brain Neoplasms
General Medicine
Triage
Ensemble learning
Test (assessment)
Logistic Models
ROC Curve
Test set
Area Under Curve
Emergency Medicine
Artificial intelligence
business
Tomography, X-Ray Computed
computer
Algorithm
Sentence
Natural language processing
Subjects
Details
- ISSN :
- 15328171
- Volume :
- 51
- Database :
- OpenAIRE
- Journal :
- The American journal of emergency medicine
- Accession number :
- edsair.doi.dedup.....6819ac3db038d72e038d564c978485ad