Extracting social support and social isolation information from clinical psychiatry notes: comparing a rule-based natural language processing system and a large language model

Braja Gopal Patra; Lauren A Lepow; Praneet Kasi Reddy Jagadeesh Kumar; Veer Vekaria; Mohit Manoj Sharma; Prakash Adekkanattu; Brian Fennessy; Gavin Hynes; Isotta Landi; Jorge A Sanchez-Ruiz; Euijung Ryu; Joanna M Biernacka; Girish N Nadkarni; Ardesheer Talati; Myrna Weissman; Mark Olfson; J John Mann; Yiye Zhang; Alexander W Charney; Jyotishman Pathak

doi:10.1093/jamia/ocae260

Extracting social support and social isolation information from clinical psychiatry notes: comparing a rule-based natural language processing system and a large language model

J Am Med Inform Assoc. 2025 Jan 1;32(1):218-226. doi: 10.1093/jamia/ocae260.

Authors

Braja Gopal Patra¹, Lauren A Lepow², Praneet Kasi Reddy Jagadeesh Kumar¹, Veer Vekaria¹, Mohit Manoj Sharma¹, Prakash Adekkanattu³, Brian Fennessy², Gavin Hynes², Isotta Landi², Jorge A Sanchez-Ruiz⁴, Euijung Ryu⁵, Joanna M Biernacka^{4

5}, Girish N Nadkarni², Ardesheer Talati^{6

7}, Myrna Weissman^{6

7}, Mark Olfson^{6

7

8}, J John Mann^{7

8}, Yiye Zhang¹, Alexander W Charney², Jyotishman Pathak^{1

9}

Affiliations

¹ Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
² Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
³ Information Technologies and Services, Weill Cornell Medicine, New York, NY 10065, USA.
⁴ Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN 55905, USA.
⁵ Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA.
⁶ Department of Psychiatry, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA.
⁷ New York State Psychiatric Institute, New York, NY 10032, USA.
⁸ Columbia University Irving Medical Center, New York, NY 10032, USA.
⁹ Department of Psychiatry, Weill Cornell Medicine, New York, NY 10065, USA.

PMID: 39423850
PMCID: PMC11648716 (available on 2025-10-18)
DOI: 10.1093/jamia/ocae260

Abstract

Objectives: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented in narrative clinical notes rather than as structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of extraction of such information.

Materials and methods: Psychiatric encounter notes from Mount Sinai Health System (MSHS, n = 300) and Weill Cornell Medicine (WCM, n = 225) were annotated to create a gold-standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (eg, social network, instrumental support, and loneliness).

Results: For extracting SS/SI, the RBS obtained higher macroaveraged F1-scores than the LLM at both MSHS (0.89 versus 0.65) and WCM (0.85 versus 0.82). For extracting the subcategories, the RBS also outperformed the LLM at both MSHS (0.90 versus 0.62) and WCM (0.82 versus 0.81).

Discussion and conclusion: Unexpectedly, the RBS outperformed the LLMs across all metrics. An intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS was designed and refined to follow the same specific rules as the gold-standard annotations. Conversely, the LLM was more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages, although additional replication studies are warranted.

Keywords: electronic health records; large language model; natural language processing; social determinants of health; social isolation; social support.

Publication types

Comparative Study

MeSH terms

Algorithms
Data Mining / methods
Electronic Health Records*
Humans
Mental Disorders
Natural Language Processing*
Social Isolation*
Social Support*

Abstract

Publication types

MeSH terms

Grants and funding