Extracting Methodology Components from AI Research Papers: A Data-driven Factored Sequence Labeling Approach

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023(2023)

引用 0|浏览1
暂无评分
摘要
Extraction of methodology component names from scientific articles is a challenging task due to the diversified contexts around the occurrences of these entities, and the different levels of granularity and containment relationships exhibited by these entities. We hypothesize that standard sequence labeling approaches may not adequately model the dependence of methodology name mentions with their contexts, due to the problems of their large, fast evolving, and domain-specific vocabulary. As a solution, we propose a factored approach, where the mention-context dependencies are represented in a more fine-grained manner, thus allowing the model parameters to better adjust to the different characteristic patterns inherent within the data. In particular, we experiment with two variants of this factored approach - one that uses the per-entity category information derived from an ontology, and the other that makes use of the topology of the sentence embedding space to infer a category for each entity constituting that sentence. We demonstrate that both these factored variants of SciBERT outperform their non-factored counterpart, a state-of-the-art model for scientific concept extraction.
更多
查看译文
关键词
Information Extraction,Factored Model,Clustering,Scientific Literature
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要