Start Over

Concordance between humans and GPT-4 in appraising the methodological quality of case reports and case series using the Murad tool.

Authors :: Tarakji, Zin
Kanaan, Adel
Saadi, Samer
Firwana, Mohammed
Allababidi, Adel Kabbara
Abusalih, Mohamed F.
Basmaci, Rami
Rajjo, Tamim I.
Wang, Zhen
Murad, M. Hassan
Hasan, Bashar
Source :: BMC Medical Research Methodology. 11/4/2024, Vol. 24 Issue 1, p1-5. 5p.
Publication Year :: 2024
Abstract: Background: Assessing the methodological quality of case reports and case series is challenging due to human judgment variability and time constraints. We evaluated the agreement in judgments between human reviewers and GPT-4 when applying a standard methodological quality assessment tool designed for case reports and series. Methods: We searched Scopus for systematic reviews published in 2023–2024 that cited the appraisal tool by Murad et al. A GPT-4 based agent was developed to assess the methodological quality using the 8 signaling questions of the tool. Observed agreement and agreement coefficient were estimated comparing published judgments of human reviewers to GPT-4 assessment. Results: We included 797 case reports and series. The observed agreement ranged between 41.91% and 80.93% across the eight questions (agreement coefficient ranged from 25.39 to 79.72%). The lowest agreement was noted in the first signaling question about selection of cases. The agreement was similar in articles published in journals with impact factor < 5 vs. ≥ 5, and when excluding systematic reviews that did not use 3 causality questions. Repeating the analysis using the same prompts demonstrated high agreement between the two GPT-4 attempts except for the first question about selection of cases. Conclusions: The study demonstrates a moderate agreement between GPT-4 and human reviewers in assessing the methodological quality of case series and reports using the Murad tool. The current performance of GPT-4 seems promising but unlikely to be sufficient for the rigor of a systematic review and pairing the model with a human reviewer is required. [ABSTRACT FROM AUTHOR]

Subjects :: *GENERATIVE pre-trained transformers
*ARTIFICIAL intelligence
*JUDGMENT (Psychology)
*PERIODICAL articles
*PERIODICAL publishing

Details

Language :: English
ISSN :: 14712288
Volume :: 24
Issue :: 1
Database :: Academic Search Index
Journal :: BMC Medical Research Methodology
Publication Type :: Academic Journal
Accession number :: 180654047
Full Text :: https://doi.org/10.1186/s12874-024-02372-6

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Concordance between humans and GPT-4 in appraising the methodological quality of case reports and case series using the Murad tool.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Concordance between humans and GPT-4 in appraising the methodological quality of case reports and case series using the Murad tool.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources