Building a Gold Standard Dataset to Identify Articles About Geographic Information Science

IEEE ACCESS(2022)

引用 1|浏览6
暂无评分
摘要
To know the overall regional or international scientific production is of vital importance to many areas of knowledge. Nevertheless, in interdisciplinary areas such as Geographic Information Science (GISc) it is not enough to just count papers published in specific journals. Most of them, as is the case of the International Journal of Remote Sensing (IJRS), welcome GISc papers but are not exclusive to that area so the production assignable to authors in the region must consider not only affiliation but also whether or not each paper falls into the theme of GISc. IJRS publishes far more papers than any other GISc journal, so it is important to assess quantitatively how many of them are of GISc. In this work, a representative sample of IJRS articles published over a period of almost 30 years was analyzed using a specific GISc definition. With these data, a manual classification methodology through a set of experts was carried out, and a dataset was built, analyzed, and statistically tested. As a result we estimate that between 47 and 76% of the IJRS articles can be considered from GISc, with a confidence level of 95%. Aside from the primary goal, this set could be used as a gold standard for future classification tasks. It constitutes the first GISc dataset of this kind, that may be used to train artificial intelligence systems capable of performing the same classification automatically and in a massive way. A similar procedure could be applied to other interdisciplinary fields of knowledge as well.
更多
查看译文
关键词
Earth, Production, Manuals, Information science, Geography, Task analysis, Social sciences, Gold standard, manual classification, indexer consistency, geographic information science
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要