1. Artificial Intelligence Large Language Models Address Anterior Cruciate Ligament Reconstruction: Superior Clarity and Completeness by Gemini Compared to ChatGPT-4 in Response to American Academy of Orthopedic Surgeons Clinical Practice Guidelines.
- Author
-
Quinn M, Milner JD, Schmitt P, Morrissey P, Lemme N, Marcaccio S, DeFroda S, Tabaddor R, and Owens BD
- Abstract
Purpose: To assess the ability of ChatGPT-4 and Gemini to generate accurate and relevant responses to the 2022 American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines for ACLR., Methods: Responses from ChatGPT-4 and Gemini to prompts derived from all 15 AAOS guidelines were evaluated by seven fellowship trained orthopedic sports medicine surgeons using a structured questionnaire assessing five key characteristics on a scale from 1-5. The prompts were categorized into three areas: Diagnosis and Preoperative Management, Surgical Timing and Technique, and Rehabilitation and Prevention. Statistical analysis included mean scoring, standard deviation, and two-sided t-tests to compare the performance between the two LLMs. Scores were then evaluated for inter-rater reliability (IRR)., Results: Overall, both LLMs performed well with means scores > 4 for the five key characteristics. Gemini demonstrated superior performance in overall clarity (4.848 ± 0.36 vs 4.743 ± 0.481, p = 0.034), but all other characteristics demonstrated non-significant differences (p = >0.05). Gemini also demonstrated superior clarity in the surgical timing and technique (p= 0.038) as well as the prevention and rehabilitation (p= 0.044) sub-categories. Additionally, Gemini had superior performance completeness scores in the rehabilitation and prevention sub-category (p= 0.044), but no statistically significant differences were found amongst the other sub-categories. The overall IRR was found to be 0.71 (moderate)., Conclusion: Both Gemini and ChatGPT-4 demonstrate an overall good ability to generate accurate and relevant responses to question prompts based on the 2022 AAOS clinical practice guidelines for ACLR. However, Gemini demonstrated superior clarity in multiple domains in addition to superior completeness for questions pertaining to rehabilitation and prevention., Clinical Relevance: The current study addresses a current gap in the LLM and ACLR literature by comparing the performance of ChatGPT-4 to Gemini, which is growing in popularity with more than 300 million individual uses in May 2024 alone. Moreover, the results demonstrated superior performance of Gemini in both clarity and completeness, which are critical elements of a tool being used by patients for educational purposes. Additionally, the current study uses question prompts based on the AAOS CPG which may be used as a method of standardization for future investigations on performance of LLM platforms. For these reasons, the authors believe that the results of the current study would be of interest to both the readership of Arthroscopy and patients, alike., (Copyright © 2024. Published by Elsevier Inc.)
- Published
- 2024
- Full Text
- View/download PDF