Start Over

Comparing GPT-4 and Human Researchers in Health Care Data Analysis: Qualitative Description Study.

Authors :: Li KD
Fernandez AM
Schwartz R
Rios N
Carlisle MN
Amend GM
Patel HV
Breyer BN
Source :: Journal of medical Internet research [J Med Internet Res] 2024 Aug 21; Vol. 26, pp. e56500. Date of Electronic Publication: 2024 Aug 21.
Publication Year :: 2024
Abstract: Background: Large language models including GPT-4 (OpenAI) have opened new avenues in health care and qualitative research. Traditional qualitative methods are time-consuming and require expertise to capture nuance. Although large language models have demonstrated enhanced contextual understanding and inferencing compared with traditional natural language processing, their performance in qualitative analysis versus that of humans remains unexplored. Objective: We evaluated the effectiveness of GPT-4 versus human researchers in qualitative analysis of interviews with patients with adult-acquired buried penis (AABP). Methods: Qualitative data were obtained from semistructured interviews with 20 patients with AABP. Human analysis involved a structured 3-stage process-initial observations, line-by-line coding, and consensus discussions to refine themes. In contrast, artificial intelligence (AI) analysis with GPT-4 underwent two phases: (1) a naïve phase, where GPT-4 outputs were independently evaluated by a blinded reviewer to identify themes and subthemes and (2) a comparison phase, where AI-generated themes were compared with human-identified themes to assess agreement. We used a general qualitative description approach. Results: The study population (N=20) comprised predominantly White (17/20, 85%), married (12/20, 60%), heterosexual (19/20, 95%) men, with a mean age of 58.8 years and BMI of 41.1 kg/m <superscript>2</superscript> . Human qualitative analysis identified "urinary issues" in 95% (19/20) and GPT-4 in 75% (15/20) of interviews, with the subtheme "spray or stream" noted in 60% (12/20) and 35% (7/20), respectively. "Sexual issues" were prominent (19/20, 95% humans vs 16/20, 80% GPT-4), although humans identified a wider range of subthemes, including "pain with sex or masturbation" (7/20, 35%) and "difficulty with sex or masturbation" (4/20, 20%). Both analyses similarly highlighted "mental health issues" (11/20, 55%, both), although humans coded "depression" more frequently (10/20, 50% humans vs 4/20, 20% GPT-4). Humans frequently cited "issues using public restrooms" (12/20, 60%) as impacting social life, whereas GPT-4 emphasized "struggles with romantic relationships" (9/20, 45%). "Hygiene issues" were consistently recognized (14/20, 70% humans vs 13/20, 65% GPT-4). Humans uniquely identified "contributing factors" as a theme in all interviews. There was moderate agreement between human and GPT-4 coding (κ=0.401). Reliability assessments of GPT-4's analyses showed consistent coding for themes including "body image struggles," "chronic pain" (10/10, 100%), and "depression" (9/10, 90%). Other themes like "motivation for surgery" and "weight challenges" were reliably coded (8/10, 80%), while less frequent themes were variably identified across multiple iterations. Conclusions: Large language models including GPT-4 can effectively identify key themes in analyzing qualitative health care data, showing moderate agreement with human analysis. While human analysis provided a richer diversity of subthemes, the consistency of AI suggests its use as a complementary tool in qualitative research. With AI rapidly advancing, future studies should iterate analyses and circumvent token limitations by segmenting data, furthering the breadth and depth of large language model-driven qualitative analyses. (©Kevin Danis Li, Adrian M Fernandez, Rachel Schwartz, Natalie Rios, Marvin Nathaniel Carlisle, Gregory M Amend, Hiren V Patel, Benjamin N Breyer. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 21.08.2024.)

Subjects :: Humans
Male
Adult
Middle Aged
Data Analysis
Research Personnel psychology
Research Personnel statistics & numerical data
Aged
Qualitative Research

Details

Language :: English
ISSN :: 1438-8871
Volume :: 26
Database :: MEDLINE
Journal :: Journal of medical Internet research
Publication Type :: Academic Journal
Accession number :: 39167785
Full Text :: https://doi.org/10.2196/56500

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Comparing GPT-4 and Human Researchers in Health Care Data Analysis: Qualitative Description Study.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Comparing GPT-4 and Human Researchers in Health Care Data Analysis: Qualitative Description Study.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources