2SCE-4SL: a 2-stage causality extraction framework for scientific literature

Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE)(2023)

引用 0|浏览2
暂无评分
摘要
Extracting causality from scientific literature is a crucial task that underpins many downstream knowledge-driven applications. To this end, this paper presents a novel causality extraction framework for scientific literature, called 2-Stage Causality Extraction for Scientific Literature (2SCE-4SL). The framework consists of two stages: in the stage 1, terms and causal trigger words are identified from causal sentences in the literature, and noisy causal triplets are then collocated. In the stage 2, we propose a Denoising AutoEncoder based on Transformer to represent the causal sentences. This approach is used to learn the causal dependency and contextual information of sentences, incorporating causal trigger word tagging and noise elimination, as well as injecting domain-specific knowledge. By combining the causality structure of stage 1 and the causality representation of stage 2, the true causal triplets are identified from the noisy causal triplets. We conducted experiments on an open access scientific literature dataset, comparing the performance of different disciplines, different training data volume, different document length and whether causality representation. We found that the average precision of 2SCE-4SL was 0.8146, and the average F1 was 0.8308, with the best performance achieved on full-text data. We also verified the effectiveness of the causality representation in stage 2, demonstrating that the architecture can capture the causal dependency of sentences and achieve good performance on two related tasks. Overall, detailed comparative and ablation experiments revealed that 2SCE-4SL requires only a small amount of annotated data to achieve better performance and domain adaptability in scientific literature.
更多
查看译文
关键词
2SCE-4SL,Causality extraction,Causality representation,Scientific literature mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要