Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model
CoRR(2024)
Abstract
Background: Social support (SS) and social isolation (SI) are social
determinants of health (SDOH) associated with psychiatric outcomes. In
electronic health records (EHRs), individual-level SS/SI is typically
documented as narrative clinical notes rather than structured coded data.
Natural language processing (NLP) algorithms can automate the otherwise
labor-intensive process of data extraction.
Data and Methods: Psychiatric encounter notes from Mount Sinai Health System
(MSHS, n=300) and Weill Cornell Medicine (WCM, n=225) were annotated and
established a gold standard corpus. A rule-based system (RBS) involving
lexicons and a large language model (LLM) using FLAN-T5-XL were developed to
identify mentions of SS and SI and their subcategories (e.g., social network,
instrumental support, and loneliness).
Results: For extracting SS/SI, the RBS obtained higher macro-averaged
f-scores than the LLM at both MSHS (0.89 vs. 0.65) and WCM (0.85 vs. 0.82). For
extracting subcategories, the RBS also outperformed the LLM at both MSHS (0.90
vs. 0.62) and WCM (0.82 vs. 0.81).
Discussion and Conclusion: Unexpectedly, the RBS outperformed the LLMs across
all metrics. Intensive review demonstrates that this finding is due to the
divergent approach taken by the RBS and LLM. The RBS were designed and refined
to follow the same specific rules as the gold standard annotations. Conversely,
the LLM were more inclusive with categorization and conformed to common
English-language understanding. Both approaches offer advantages and are made
available open-source for future testing.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined