Ensemble System For Identification Of Cited Text Spans: Based On Two Steps Of Feature Selection

INFORMATION RETRIEVAL (CCIR 2019)(2019)

引用 0|浏览6
暂无评分
摘要
CL-SciSumm Shared Task proposed a novel approach which is to generate scientific summary based on cited text spans (CTS) in target paper. This mechanism requires identifying CTS from reference paper according to citation sentence (citance) firstly. Therefore, CTS identification has then arisen the attention of many scholars since identified sentences will finally be aggregated for summary generation. Prior studies viewed this task as a text classification problem and feature selection is one key step for modeling the linkage between CTS and citance. Since most studies have paved the work by building features arbitrarily and applying them directly to model training. There is a lack of investigation to evaluate the effectiveness of features. Performance variation caused by different classifiers are barely taken into consideration as well. To further improve the performance of CTS identification, this paper builds an ensemble system based on two steps of feature selection. In the first step, we construct a set of features and do correlation analysis to select those which are higher-correlated with CTS. The second step is responsible for assigning several basic classifiers (SVM, Decision Tree and Logistic Regression) with their best performing feature sets. Experimental results demonstrate that our proposed systems can surpass the previous best performing one.
更多
查看译文
关键词
Cited text spans, Feature selection, Negative sampling, Text classification, Ensemble system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要