Start Over

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes.

Authors :: Hartman V
Zhang X
Poddar R
McCarty M
Fortenko A
Sholle E
Sharma R
Campion T Jr
Steel PAD
Source :: JAMA network open [JAMA Netw Open] 2024 Dec 02; Vol. 7 (12), pp. e2448723. Date of Electronic Publication: 2024 Dec 02.
Publication Year :: 2024
Abstract: Importance: An emergency medicine (EM) handoff note generated by a large language model (LLM) has the potential to reduce physician documentation burden without compromising the safety of EM-to-inpatient (IP) handoffs. Objective: To develop LLM-generated EM-to-IP handoff notes and evaluate their accuracy and safety compared with physician-written notes. Design, Setting, and Participants: This cohort study used EM patient medical records with acute hospital admissions that occurred in 2023 at NewYork-Presbyterian/Weill Cornell Medical Center. A customized clinical LLM pipeline was trained, tested, and evaluated to generate templated EM-to-IP handoff notes. Using both conventional automated methods (ie, recall-oriented understudy for gisting evaluation [ROUGE], bidirectional encoder representations from transformers score [BERTScore], and source chunking approach for large-scale inconsistency evaluation [SCALE]) and a novel patient safety-focused framework, LLM-generated handoff notes vs physician-written notes were compared. Data were analyzed from October 2023 to March 2024. Exposure: LLM-generated EM handoff notes. Main Outcomes and Measures: LLM-generated handoff notes were evaluated for (1) lexical similarity with respect to physician-written notes using ROUGE and BERTScore; (2) fidelity with respect to source notes using SCALE; and (3) readability, completeness, curation, correctness, usefulness, and implications for patient safety using a novel framework. Results: In this study of 1600 EM patient records (832 [52%] female and mean [SD] age of 59.9 [18.9] years), LLM-generated handoff notes, compared with physician-written ones, had higher ROUGE (0.322 vs 0.088), BERTScore (0.859 vs 0.796), and SCALE scores (0.691 vs 0.456), indicating the LLM-generated summaries exhibited greater similarity and more detail. As reviewed by 3 board-certified EM physicians, a subsample of 50 LLM-generated summaries had a mean (SD) usefulness score of 4.04 (0.86) out of 5 (compared with 4.36 [0.71] for physician-written) and mean (SD) patient safety scores of 4.06 (0.86) out of 5 (compared with 4.50 [0.56] for physician-written). None of the LLM-generated summaries were classified as a critical patient safety risk. Conclusions and Relevance: In this cohort study of 1600 EM patient medical records, LLM-generated EM-to-IP handoff notes were determined superior compared with physician-written summaries via conventional automated evaluation methods, but marginally inferior in usefulness and safety via a novel evaluation framework. This study suggests the importance of a physician-in-loop implementation design for this model and demonstrates an effective strategy to measure preimplementation patient safety of LLM models.

Subjects :: Humans
Female
Male
Middle Aged
Cohort Studies
Documentation methods
Documentation standards
Patient Safety standards
Adult
Natural Language Processing
Aged
Patient Handoff standards
Electronic Health Records standards
Emergency Medicine methods
Emergency Medicine standards

Details

Language :: English
ISSN :: 2574-3805
Volume :: 7
Issue :: 12
Database :: MEDLINE
Journal :: JAMA network open
Publication Type :: Academic Journal
Accession number :: 39625719
Full Text :: https://doi.org/10.1001/jamanetworkopen.2024.48723

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources