Back to Search Start Over

Extracting social support and social isolation information from clinical psychiatry notes: comparing a rule-based natural language processing system and a large language model.

Authors :
Patra, Braja Gopal
Lepow, Lauren A
Kumar, Praneet Kasi Reddy Jagadeesh
Vekaria, Veer
Sharma, Mohit Manoj
Adekkanattu, Prakash
Fennessy, Brian
Hynes, Gavin
Landi, Isotta
Sanchez-Ruiz, Jorge A
Ryu, Euijung
Biernacka, Joanna M
Nadkarni, Girish N
Talati, Ardesheer
Weissman, Myrna
Olfson, Mark
Mann, J John
Zhang, Yiye
Charney, Alexander W
Pathak, Jyotishman
Source :
Journal of the American Medical Informatics Association; Jan2025, Vol. 32 Issue 1, p218-226, 9p
Publication Year :
2025

Abstract

Objectives Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented in narrative clinical notes rather than as structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of extraction of such information. Materials and Methods Psychiatric encounter notes from Mount Sinai Health System (MSHS, n  = 300) and Weill Cornell Medicine (WCM, n  = 225) were annotated to create a gold-standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (eg, social network , instrumental support , and loneliness). Results For extracting SS/SI, the RBS obtained higher macroaveraged F1-scores than the LLM at both MSHS (0.89 versus 0.65) and WCM (0.85 versus 0.82). For extracting the subcategories, the RBS also outperformed the LLM at both MSHS (0.90 versus 0.62) and WCM (0.82 versus 0.81). Discussion and Conclusion Unexpectedly, the RBS outperformed the LLMs across all metrics. An intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS was designed and refined to follow the same specific rules as the gold-standard annotations. Conversely, the LLM was more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages, although additional replication studies are warranted. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10675027
Volume :
32
Issue :
1
Database :
Complementary Index
Journal :
Journal of the American Medical Informatics Association
Publication Type :
Academic Journal
Accession number :
181680584
Full Text :
https://doi.org/10.1093/jamia/ocae260