1. Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study
- Author
-
Deiner, Michael S, Honcharov, Vlad, Li, Jiawei, Mackey, Tim K, Porco, Travis C, and Sarkar, Urmimala
- Subjects
Health Services and Systems ,Health Sciences ,Social Media ,Humans ,Natural Language Processing ,generative large language model ,generative pretrained transformer ,GPT ,Claude ,Twitter ,X formerly known as Twitter ,social media ,inductive content analysis ,COVID-19 ,vaccine hesitancy ,infodemiology ,GPT ,generative pretrained transformer ,Health services and systems - Abstract
BackgroundManually analyzing public health-related content from social media provides valuable insights into the beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) can potentially summarize and interpret large amounts of text, but it is unclear to what extent LLMs can glean subtle health-related meanings in large sets of social media posts and reasonably report health-related themes.ObjectiveWe aimed to assess the feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts by attempting to answer the following question: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts?MethodsWe asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes as was conducted manually in a published study about vaccine rhetoric. We used the results from that study as background for this LLM experiment by comparing the results from the prior manual human analyses with the analyses from 3 LLMs: GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed the consistency of repeated analysis from each LLM.ResultsThe LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P
- Published
- 2024