Generic features selection for structure classification of diverse styled scholarly articles

Multimedia Tools and Applications（2024）

引用 0|浏览3

暂无评分

摘要

The enormous growth in online research publications in diversified domains has attracted the research community to extract these valuable scientific resources by searching online digital libraries and publishers’ websites. A precise search is desired to enlist most related articles by applying semantic queries to the document’s metadata and the structural elements. The online search engines and digital libraries offer only keyword-based search on full-body text, which creates excessive results. Therefore, the research article’s structural and metadata information has to be stored in machine comprehendible form by the online research publishers. The research community in recent years has adopted different approaches to extract structural information from research documents like rule-based heuristics and machine-learning-based approaches. Studies suggest that machine-learning-based techniques have produced optimum results for document structure extraction from publishers having diversified publication layouts. In this paper, we have proposed thirteen different logical layout structural (LLS) components. We have identified a two-staged innovative set of generic features that are associated with the LLS. This approach has given our technique an advantage against the state-of-the-art for structural classification of digital scientific articles with diversified publication styles. We have applied chi-square ( chi^2 ) for feature selection, and the final result has revealed that SVM (Kernal function) has produced an optimum result with an overall F-measure of 0.95.

查看译文

关键词

Features Engineering,Machine Learning,Research Article,Metadata Extraction,Text mining

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要