Somef: A Framework For Capturing Scientific Software Metadata From Its Documentation

2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2019)

引用 11|浏览26
暂无评分
摘要
Scientific software has become a key asset to reproduce and understand the products of scientific research in many disciplines. However, scientific software is becoming increasingly complex and, as a result, researchers need to spend a significant amount of time finding, reading and understanding software documentation to set it up. In this paper we describe SoMEF, a Software Metadata Extraction Framework designed to help highlighting the most important parts of scientific software documentation. SoMEF processes the README files in GitHub repositories to automatically extract which parts of their text refer to the description, installation, invocation, or citation of a software component. Despite its simple features, SoMEF successfully categorizes README excerpts with a minimum 0.92 precision and 0.90 ROC AUC. These results, tested on a corpus of over 70 scientific software repositories, are a promising start towards automatically generating knowledge graphs of scientific software metadata.
更多
查看译文
关键词
scientific software metadata,SoMEF,scientific software documentation,software component,scientific software repositories,software metadata extraction framework
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要