Start Over

Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification?

Authors :: Yang E
Li MD
Raghavan S
Deng F
Lang M
Succi MD
Huang AJ
Kalpathy-Cramer J
Source :: The British journal of radiology [Br J Radiol] 2023 Sep; Vol. 96 (1149), pp. 20220769. Date of Electronic Publication: 2023 May 25.
Publication Year :: 2023
Abstract: Objectives: Current state-of-the-art natural language processing (NLP) techniques use transformer deep-learning architectures, which depend on large training datasets. We hypothesized that traditional NLP techniques may outperform transformers for smaller radiology report datasets. Methods: We compared the performance of BioBERT, a deep-learning-based transformer model pre-trained on biomedical text, and three traditional machine-learning models (gradient boosted tree, random forest, and logistic regression) on seven classification tasks given free-text radiology reports. Tasks included detection of appendicitis, diverticulitis, bowel obstruction, and enteritis/colitis on abdomen/pelvis CT reports, ischemic infarct on brain CT/MRI reports, and medial and lateral meniscus tears on knee MRI reports (7,204 total annotated reports). The performance of NLP models on held-out test sets was compared after training using the full training set, and 2.5%, 10%, 25%, 50%, and 75% random subsets of the training data. Results: In all tested classification tasks, BioBERT performed poorly at smaller training sample sizes compared to non-deep-learning NLP models. Specifically, BioBERT required training on approximately 1,000 reports to perform similarly or better than non-deep-learning models. At around 1,250 to 1,500 training samples, the testing performance for all models began to plateau, where additional training data yielded minimal performance gain. Conclusions: With larger sample sizes, transformer NLP models achieved superior performance in radiology report binary classification tasks. However, with smaller sizes (<1000) and more imbalanced training data, traditional NLP techniques performed better. Advances in Knowledge: Our benchmarks can help guide clinical NLP researchers in selecting machine-learning models according to their dataset characteristics. Competing Interests: Declaration of interestsMDS reports personal fees and non-financial support from 2 Minute Medicine, Inc., and patent royalties from Frequency Therapeutics for work not related to this manuscript. JKC reports grants from GE Healthcare, non-financial support from AWS, and grants from Genentech Foundation, outside the submitted work. The other authors report no relevant conflicts of interest.

Subjects :: Humans
Tomography, X-Ray Computed methods
Machine Learning
Magnetic Resonance Imaging
Natural Language Processing
Radiology

Details

Language :: English
ISSN :: 1748-880X
Volume :: 96
Issue :: 1149
Database :: MEDLINE
Journal :: The British journal of radiology
Publication Type :: Academic Journal
Accession number :: 37162253
Full Text :: https://doi.org/10.1259/bjr.20220769

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification?

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification?

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources