SWordNet: Inferring semantically related words from software context

Empirical Software Engineering(2013)

引用 42|浏览72
暂无评分
摘要
Code search is an integral part of software development and program comprehension. The difficulty of code search lies in the inability to guess the exact words used in the code. Therefore, it is crucial for keyword-based code search to expand queries with semantically related words, e.g., synonyms and abbreviations, to increase the search effectiveness. However, it is limited to rely on resources such as English dictionaries and WordNet to obtain semantically related words in software because many words that are semantically related in software are not semantically related in English. On the other hand, many words that are semantically related in English are not semantically related in software. This paper proposes a simple and general technique to automatically infer semantically related words (referred to as rPairs) in software by leveraging the context of words in comments and code. In addition, we propose a ranking algorithm on the rPair results and study cross-project rPairs on two sets of software with similar functionality, i.e., media browsers and operating systems. We achieve a reasonable accuracy in nine large and popular code bases written in C and Java. Our further evaluation against the state of art shows that our technique can achieve a higher precision and recall. In addition, the proposed ranking algorithm improves the rPair extraction accuracy by bringing correct rPairs to the top of the list. Our cross-project study successfully discovers overlapping rPairs among projects of similar functionality and finds that cross-project rPairs are more likely to be correct than project-specific rPairs. Since the cross-project rPairs are highly likely to be general for software of the same type, the discovered overlapping rPairs can benefit other projects of the same type that have not been analyzed.
更多
查看译文
关键词
Semantically related words,Code search,Program comprehension
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要