Learning to Discover Subsumptions between Software Engineering Concepts in Wikipedia.

SEKE(2016)

引用 24|浏览30
暂无评分
摘要
Wikipedia contains large-scale concepts and rich semantic information. A number of knowledge base construction projects such as WikiTaxonomy, DBpedia, and YAGO have acquired data from Wikipedia. Despite the huge amount of relations in Wikipedia, the semantic relations (i.e. subsumptions) between domain concepts are rather sparse, especially in software engineering (SE) area. Hence, it is difficult to derive a software engineering knowledge base directly from Wikipedia. Meanwhile, domain knowledge base has become indispensable to a growing number of applications in software engineering. So the discov- ery of missing semantic relations between software engineering concepts in Wikipedia is essential. In this paper, we propose an approach to automatically discovering the missing subsumption relations between software engineering concepts. Specifically, we extract the SE domain concepts from Wikipedia firstly. And secondly, we design a machine learning based algorithm with some novel features to calculate the semantic relevancy between concepts. Thirdly, we offer and utilize a semi-supervised model to incorporate the features, which discovers the SE subsumptions. Experimental results show that our approach can effectively find the missing subsumption relations between software engineering concepts. Finally, we build a taxonomy which contains 193,593 concepts together with 357,662 subsumption relations. Compared with the taxonomies which are extracted from general-purpose knowledge bases such as WikiTaxonomy, YAGO and Schema.org, our dataset has a larger scale in software engineering domain. Index Terms—Subsumption Extraction, Software Engineering, Wikipedia
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要