Start Over

Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings.

Authors :: Tsai, Chung-You
Hsieh, Shang-Ju
Huang, Hung-Hsiang
Deng, Juinn-Horng
Huang, Yi-You
Cheng, Pai-Yu
Source :: World Journal of Urology. 2024, Vol. 42 Issue 1, p1-9. 9p.
Publication Year :: 2024
Abstract: Purpose: To compare ChatGPT-4 and ChatGPT-3.5's performance on Taiwan urology board examination (TUBE), focusing on answer accuracy, explanation consistency, and uncertainty management tactics to minimize score penalties from incorrect responses across 12 urology domains. Methods: 450 multiple-choice questions from TUBE(2020–2022) were presented to two models. Three urologists assessed correctness and consistency of each response. Accuracy quantifies correct answers; consistency assesses logic and coherence in explanations out of total responses, alongside a penalty reduction experiment with prompt variations. Univariate logistic regression was applied for subgroup comparison. Results: ChatGPT-4 showed strengths in urology, achieved an overall accuracy of 57.8%, with annual accuracies of 64.7% (2020), 58.0% (2021), and 50.7% (2022), significantly surpassing ChatGPT-3.5 (33.8%, OR = 2.68, 95% CI [2.05–3.52]). It could have passed the TUBE written exams if solely based on accuracy but failed in the final score due to penalties. ChatGPT-4 displayed a declining accuracy trend over time. Variability in accuracy across 12 urological domains was noted, with more frequently updated knowledge domains showing lower accuracy (53.2% vs. 62.2%, OR = 0.69, p = 0.05). A high consistency rate of 91.6% in explanations across all domains indicates reliable delivery of coherent and logical information. The simple prompt outperformed strategy-based prompts in accuracy (60% vs. 40%, p = 0.016), highlighting ChatGPT's limitations in its inability to accurately self-assess uncertainty and a tendency towards overconfidence, which may hinder medical decision-making. Conclusions: ChatGPT-4's high accuracy and consistent explanations in urology board examination demonstrate its potential in medical information processing. However, its limitations in self-assessment and overconfidence necessitate caution in its application, especially for inexperienced users. These insights call for ongoing advancements of urology-specific AI tools. [ABSTRACT FROM AUTHOR]

Details

Language :: English
ISSN :: 07244983
Volume :: 42
Issue :: 1
Database :: Academic Search Index
Journal :: World Journal of Urology
Publication Type :: Academic Journal
Accession number :: 176793569
Full Text :: https://doi.org/10.1007/s00345-024-04957-8

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings.

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings.

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources