1. Using Large Language Models to Generate Script Concordance Test in Medical Education: ChatGPT and Claude
- Author
-
Yavuz Selim Kıyak and Emre Emekli
- Subjects
script concordance test ,clinical reasoning ,medical education ,artificial intelligence ,ChatGPT ,Medicine ,Education - Abstract
We aimed to determine the quality of AI-generated (ChatGPT-4 and Claude 3) Script Concordance Test (SCT) items through an expert panel. We generated SCT items on abdominal radiology using a complex prompt in large language model (LLM) chatbots (ChatGPT-4 and Claude 3 (Sonnet) in April 2024) and evaluated the items’ quality through an expert panel of 16 radiologists. Expert panel, which was blind to the origin of the items provided without modifications, independently answered each item and assessed them using 12 quality indicators. Data analysis included descriptive statistics, bar charts to compare responses against accepted forms, and a heatmap to show performance in terms of the quality indicators. SCT items generated by chatbots assess clinical reasoning rather than only factual recall (ChatGPT: 92.50%, Claude: 85.00%). The heatmap indicated that the items were generally acceptable, with most responses favorable across quality indicators (ChatGPT: 71.77%, Claude: 64.23%). The comparison of the bar charts with acceptable and unacceptable forms revealed that 73.33% and 53.33% of the questions in the items can be considered acceptable, respectively, for ChatGPT and Claude. The use of LLMs to generate SCT items can be helpful for medical educators by reducing the required time and effort. Although the prompt provides a good starting point, it remains crucial to review and revise AI-generated SCT items before educational use. The prompt and the custom GPT, “Script Concordance Test Generator”, available at https://chatgpt.com/g/g-RlzW5xdc1-script-concordance-test-generator, can streamline SCT item development.
- Published
- 2024
- Full Text
- View/download PDF