Building the Extraction Model of the Software Entities from Full-Text of Research Articles Based on BERT.


引用 0|浏览1
Software entities in the full-text of research articles are vital academic resources. Extracting software entities from the full-text of research articles can improve the process of knowledge organisation and is a important aspect of knowledge entitymetrics. In this study, the full-text of research articles were collated from the journal Scientometrics from 2010 to 2020. The extracted software entities are subjected to metric analysis and mining from different perspectives, such as the distribution in the different structures of articles, the number of mentions and citations, and the time-series evolution. To build an automated software entities extraction model, entitymetrics tools are provided. The machine learning and deep learning models, namely, conditional random field (CRF), Bi-LSTM-CRF, and the bi-directional encoder representation from transformers (BERT), were established. Tthe highest F1 values of 84.99% was achieved with BERT. The future implications of the study include the application of the BERT-based model for the software entity extraction from other journals to deepen the mining and the analysis of software entities from multiple perspectives.
AI 理解论文
Chat Paper