Start Over

Benchmarking LLM chatbots’ oncological knowledge with the Turkish Society of Medical Oncology’s annual board examination questions

Authors :: Efe Cem Erdat
Engin Eren Kavak
Source :: BMC Cancer, Vol 25, Iss 1, Pp 1-7 (2025)
Publication Year :: 2025
Publisher :: BMC, 2025.
Abstract: Abstract Background Large language models (LLMs) have shown promise in various medical applications, including clinical decision-making and education. In oncology, the increasing complexity of patient care and the vast volume of medical literature require efficient tools to assist practitioners. However, the use of LLMs in oncology education and knowledge assessment remains underexplored. This study aims to evaluate and compare the oncological knowledge of four LLMs using standardized board examination questions. Methods We assessed the performance of four LLMs—Claude 3.5 Sonnet (Anthropic), ChatGPT 4o (OpenAI), Llama-3 (Meta), and Gemini 1.5 (Google)—using the Turkish Society of Medical Oncology’s annual board examination questions from 2016 to 2024. A total of 790 valid multiple-choice questions covering various oncology topics were included. Each model was tested on its ability to answer these questions in Turkish. Performance was analyzed based on the number of correct answers, with statistical comparisons made using chi-square tests and one-way ANOVA. Results Claude 3.5 Sonnet outperformed the other models, passing all eight exams with an average score of 77.6%. ChatGPT 4o passed seven out of eight exams, with an average score of 67.8%. Llama-3 and Gemini 1.5 showed lower performance, passing four and three exams respectively, with average scores below 50%. Significant differences were observed among the models’ performances (F = 17.39, p

Subjects :: Artificial intelligence
Large language models
Oncology
Clinical decision support systems
Medical education
Board examinations
Neoplasms. Tumors. Oncology. Including cancer and carcinogens
RC254-282

Details

Language :: English
ISSN :: 14712407
Volume :: 25
Issue :: 1
Database :: Directory of Open Access Journals
Journal :: BMC Cancer
Publication Type :: Academic Journal
Accession number :: edsdoj.2db9e9416815418bb4640a2e8e30114d
Document Type :: article
Full Text :: https://doi.org/10.1186/s12885-025-13596-0

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Benchmarking LLM chatbots’ oncological knowledge with the Turkish Society of Medical Oncology’s annual board examination questions

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Benchmarking LLM chatbots’ oncological knowledge with the Turkish Society of Medical Oncology’s annual board examination questions

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources