PubMed Central Citation Context Dataset

18TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2021)(2021)

Cited 0|Views3
No score
Abstract
The last few decades have witnessed the rapid growth of scientific publications and scholar data, and citation context is one of the most frequently used data in scientometrics and natural language processing. However, current citation context datasets have limitations in several regards. Therefore, we propose a large citation context dataset based on the open accessed PubMed Central papers. In total, we generated about 97.5 million citing-cited document pairs, 34.7 million paragraphs of citation context based on 2,658,541 PubMed Central papers of biomedical and life sciences. The dataset can not only be applied for citation analysis, but also can be used for biomedical text mining and other natural language processing tasks.
More
Translated text
Key words
dataset,context
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined