Start Over

Potential of multimodal large language models for data mining of medical images and free-text reports

Authors :: Yutong Zhang
Yi Pan
Tianyang Zhong
Peixin Dong
Kangni Xie
Yuxiao Liu
Hanqi Jiang
Zihao Wu
Zhengliang Liu
Wei Zhao
Wei Zhang
Shijie Zhao
Tuo Zhang
Xi Jiang
Dinggang Shen
Tianming Liu
Xin Zhang
Source :: Meta-Radiology, Vol 2, Iss 4, Pp 100103- (2024)
Publication Year :: 2024
Publisher :: KeAi Communications Co., Ltd., 2024.
Abstract: Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these data have posed significant challenges to the generalizability of current data-mining methods for clinical decision-making. Recently, multimodal large language models (MLLMs), especially Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models, have revolutionized numerous domains, significantly impacting the medical field. In this study, we conducted a detailed evaluation of the performance of the Gemini series models (including Gemini-1.0-Pro-Vision, Gemini-1.5-Pro, and Gemini-1.5-Flash) and GPT series models (including GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo) across 14 medical datasets, covering 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy) and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Moreover, we also validated the performance of the Claude-3-Opus, Yi-Large, Yi-Large-Turbo, and LLaMA 3 models to gain a comprehensive understanding of the MLLM models in the medical field. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.

Subjects :: Medical physics. Medical radiology. Nuclear medicine
R895-920

Details

Language :: English
ISSN :: 29501628
Volume :: 2
Issue :: 4
Database :: Directory of Open Access Journals
Journal :: Meta-Radiology
Publication Type :: Academic Journal
Accession number :: edsdoj.6c08c8dfff4f4288804b21c5d93f454a
Document Type :: article
Full Text :: https://doi.org/10.1016/j.metrad.2024.100103

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Potential of multimodal large language models for data mining of medical images and free-text reports

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Potential of multimodal large language models for data mining of medical images and free-text reports

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources