Start Over

A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.

Authors :: Sushil M
Zack T
Mandair D
Zheng Z
Wali A
Yu YN
Quan Y
Lituiev D
Butte AJ
Source :: Journal of the American Medical Informatics Association : JAMIA [J Am Med Inform Assoc] 2024 Oct 01; Vol. 31 (10), pp. 2315-2327.
Publication Year :: 2024
Abstract: Objective: Although supervised machine learning is popular for information extraction from clinical notes, creating large annotated datasets requires extensive domain expertise and is time-consuming. Meanwhile, large language models (LLMs) have demonstrated promising transfer learning capability. In this study, we explored whether recent LLMs could reduce the need for large-scale data annotations. Materials and Methods: We curated a dataset of 769 breast cancer pathology reports, manually labeled with 12 categories, to compare zero-shot classification capability of the following LLMs: GPT-4, GPT-3.5, Starling, and ClinicalCamel, with task-specific supervised classification performance of 3 models: random forests, long short-term memory networks with attention (LSTM-Att), and the UCSF-BERT model. Results: Across all 12 tasks, the GPT-4 model performed either significantly better than or as well as the best supervised model, LSTM-Att (average macro F1-score of 0.86 vs 0.75), with advantage on tasks with high label imbalance. Other LLMs demonstrated poor performance. Frequent GPT-4 error categories included incorrect inferences from multiple samples and from history, and complex task design, and several LSTM-Att errors were related to poor generalization to the test set. Discussion: On tasks where large annotated datasets cannot be easily collected, LLMs can reduce the burden of data labeling. However, if the use of LLMs is prohibitive, the use of simpler models with large annotated datasets can provide comparable results. Conclusions: GPT-4 demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for large annotated datasets. This may increase the utilization of NLP-based variables and outcomes in clinical studies. (© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association.)

Subjects :: Humans
Female
Natural Language Processing
Datasets as Topic
Electronic Health Records
Data Mining methods
Breast Neoplasms pathology
Breast Neoplasms classification
Supervised Machine Learning

Details

Language :: English
ISSN :: 1527-974X
Volume :: 31
Issue :: 10
Database :: MEDLINE
Journal :: Journal of the American Medical Informatics Association : JAMIA
Publication Type :: Academic Journal
Accession number :: 38900207
Full Text :: https://doi.org/10.1093/jamia/ocae146

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources