Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model
arxiv(2024)
摘要
Background: Social support (SS) and social isolation (SI) are social
determinants of health (SDOH) associated with psychiatric outcomes. In
electronic health records (EHRs), individual-level SS/SI is typically
documented as narrative clinical notes rather than structured coded data.
Natural language processing (NLP) algorithms can automate the otherwise
labor-intensive process of data extraction.
Data and Methods: Psychiatric encounter notes from Mount Sinai Health System
(MSHS, n=300) and Weill Cornell Medicine (WCM, n=225) were annotated and
established a gold standard corpus. A rule-based system (RBS) involving
lexicons and a large language model (LLM) using FLAN-T5-XL were developed to
identify mentions of SS and SI and their subcategories (e.g., social network,
instrumental support, and loneliness).
Results: For extracting SS/SI, the RBS obtained higher macro-averaged
f-scores than the LLM at both MSHS (0.89 vs. 0.65) and WCM (0.85 vs. 0.82). For
extracting subcategories, the RBS also outperformed the LLM at both MSHS (0.90
vs. 0.62) and WCM (0.82 vs. 0.81).
Discussion and Conclusion: Unexpectedly, the RBS outperformed the LLMs across
all metrics. Intensive review demonstrates that this finding is due to the
divergent approach taken by the RBS and LLM. The RBS were designed and refined
to follow the same specific rules as the gold standard annotations. Conversely,
the LLM were more inclusive with categorization and conformed to common
English-language understanding. Both approaches offer advantages and are made
available open-source for future testing.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要