Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings
CoRR(2024)
摘要
Although the International Classification of Diseases (ICD) has been adopted
worldwide, manually assigning ICD codes to clinical text is time-consuming,
error-prone, and expensive, motivating the development of automated approaches.
This paper describes a novel approach for automated ICD coding, combining
several ideas from previous related work. We specifically employ a strong
Transformer-based model as a text encoder and, to handle lengthy clinical
narratives, we explored either (a) adapting the base encoder model into a
Longformer, or (b) dividing the text into chunks and processing each chunk
independently. The representations produced by the encoder are combined with a
label embedding mechanism that explores diverse ICD code synonyms. Experiments
with different splits of the MIMIC-III dataset show that the proposed approach
outperforms the current state-of-the-art models in ICD coding, with the label
embeddings significantly contributing to the good performance. Our approach
also leads to properly calibrated classification results, which can effectively
inform downstream tasks such as quantification.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要