MulCS: Towards a Unified Deep Representation for Multilingual Code Search

2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)(2023)

Cited 12|Views30
No score
Abstract
Code search aims to search for relevant code snippets through queries, which has become an essential requirement to assist programmers in software development. With the availability of large and rapidly growing source code repositories covering various languages, multilingual code search can leverage more training data to learn complementary information across languages. Contrastive learning can naturally understand the similarity between functionally equivalent code across different languages by narrowing the distance between objects with the same function while keeping dissimilar objects further apart. Some works exist addressing monolingual code search problems with contrastive learning, however, they mainly exploit every specific programming language’s textual semantics or syntactic structures for code representation. Due to the high diversity of different languages in terms of syntax, format, and structure, these methods limit the performance of contrastive learning in multilingual training. To bridge this gap, we propose a unified semantic graph representation approach toward multilingual code search called MulCS. Specifically, we first design a general semantic graph construction strategy across different languages by Intermediate Representation (IR). Furthermore, we introduce the contrastive learning module integrated into a gated graph neural network (GGNN) to enhance query-multilingual code matching. The extensive experiments on three representative languages illustrate that our method outperforms state-of-the-art models by 10.7% to 77.5% in terms of MRR on average.
More
Translated text
Key words
Code search,multi-language,contrastive learning,intermediate representation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined