Numerical Attributes Learning for Cardiac Failure Diagnostic from Clinical Narratives – A LESA-CamemBERT-bio Approach
arxiv(2024)
摘要
Medical records created by healthcare professionals upon patient admission
are rich in details critical for diagnosis. Yet, their potential is not fully
realized because of obstacles such as complex medical language, inadequate
comprehension of medical numerical data by state-of-the-art Large Language
Models (LLMs), and the limitations imposed by small annotated training
datasets. This research aims to classify numerical values extracted from
medical documents across seven distinct physiological categories, employing
CamemBERT-bio. Previous studies suggested that transformer-based models might
not perform as well as traditional NLP models in such tasks. To enhance
CamemBERT-bio's performances, we introduce two main innovations: integrating
keyword embeddings into the model and adopting a number-agnostic strategy by
excluding all numerical data from the text. The implementation of label
embedding techniques refines the attention mechanisms, while the technique of
using a `numerical-blind' dataset aims to bolster context-centric learning.
Another key component of our research is determining the criticality of
extracted numerical data. To achieve this, we utilized a simple approach that
involves verifying if the value falls within the established standard ranges
Our findings are encouraging, showing substantial improvements in the
effectiveness of CamemBERT-bio, surpassing conventional methods with an F1
score of 0.89. This represents an over 20% increase over the 0.73 F_1 score
of traditional approaches and an over 9% increase over the 0.82 F_1 score of
state-of-the-art approaches.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要