Data labeling through the centralities of co-reference networks improves the classification accuracy of scientific papers

JOURNAL OF INFORMETRICS（2024）

引用 0|浏览8

暂无评分

摘要

Labeled data are fed to learning models of classification tasks to help them learn to classify unlabeled data. Massive papers are hinged by citations to a few influential papers, much smaller than the total size, which, if labeled, would cause the spread of label information to the most of the papers. We utilized the co -reference relationship between papers since the references cited by a paper dataset usually cannot be contained by the dataset. We stated the problem of optimal paper labeling as the problem of picking a given fraction of nodes from a co -reference network to maximize the number of their neighbors, which is a submodular maximization problem with a cardinality constraint, NP -hard for general networks. We approximately solved it by picking nodes according to the ranks of specific network centralities. We further proved that labeling papers according to the rank of degree, the lowest -complexity centrality, can give a near -optimal solution with specific constraints on the maximum degree of co -reference network and labeling proportion. Experimental results show that our method brings a significant improvement in the accuracy of classification.

查看译文

关键词

Paper classification,Citation networks,Labeling strategy

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要