LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications
arxiv(2024)
摘要
Electronic health records (EHR) even though a boon for healthcare
practitioners, are growing convoluted and longer every day. Sifting around
these lengthy EHRs is taxing and becomes a cumbersome part of physician-patient
interaction. Several approaches have been proposed to help alleviate this
prevalent issue either via summarization or sectioning, however, only a few
approaches have truly been helpful in the past. With the rise of automated
methods, machine learning (ML) has shown promise in solving the task of
identifying relevant sections in EHR. However, most ML methods rely on labeled
data which is difficult to get in healthcare. Large language models (LLMs) on
the other hand, have performed impressive feats in natural language processing
(NLP), that too in a zero-shot manner, i.e. without any labeled data. To that
end, we propose using LLMs to identify relevant section headers. We find that
GPT-4 can effectively solve the task on both zero and few-shot settings as well
as segment dramatically better than state-of-the-art methods. Additionally, we
also annotate a much harder real world dataset and find that GPT-4 struggles to
perform well, alluding to further research and harder benchmarks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要