Extrinsically-Focused Evaluation of Omissions in Medical Summarization.
CoRR(2023)
摘要
The goal of automated summarization techniques (Paice, 1990; Kupiec et al,
1995) is to condense text by focusing on the most critical information.
Generative large language models (LLMs) have shown to be robust summarizers,
yet traditional metrics struggle to capture resulting performance (Goyal et al,
2022) in more powerful LLMs. In safety-critical domains such as medicine, more
rigorous evaluation is required, especially given the potential for LLMs to
omit important information in the resulting summary. We propose MED-OMIT, a new
omission benchmark for medical summarization. Given a doctor-patient
conversation and a generated summary, MED-OMIT categorizes the chat into a set
of facts and identifies which are omitted from the summary. We further propose
to determine fact importance by simulating the impact of each fact on a
downstream clinical task: differential diagnosis (DDx) generation. MED-OMIT
leverages LLM prompt-based approaches which categorize the importance of facts
and cluster them as supporting or negating evidence to the diagnosis. We
evaluate MED-OMIT on a publicly-released dataset of patient-doctor
conversations and find that MED-OMIT captures omissions better than alternative
metrics.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要