1. Will code one day run a code? Performance of language models on ACEM primary examinations and implications.
- Author
-
Smith, Jesse, Choi, Philip MC, and Buntine, Paul
- Subjects
- *
NATIONAL competency-based educational tests , *NATURAL language processing , *COMPUTER assisted testing (Education) , *ARTIFICIAL intelligence , *COMPARATIVE studies , *PHILOSOPHY of education , *CLINICAL competence , *DESCRIPTIVE statistics , *EMERGENCY medicine , *MEDICAL education - Abstract
Objective: Large language models (LLMs) have demonstrated mixed results in their ability to pass various specialist medical examination and their performance within the field of emergency medicine remains unknown. Methods: We explored the performance of three prevalent LLMs (OpenAI's GPT series, Google's Bard, and Microsoft's Bing Chat) on a practice ACEM primary examination. Results: All LLMs achieved a passing score, with scores with GPT 4.0 outperforming the average candidate. Conclusion: Large language models, by passing the ACEM primary examination, show potential as tools for medical education and practice. However, limitations exist and are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF