Chrome Extension
WeChat Mini Program
Use on ChatGLM

Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING(2023)

Cited 0|Views16
No score
Abstract
Code clone detection is an important aspect of software development and maintenance. The extensive research in this domain has helped reduce the complexity and increase the robustness of source code, thereby assisting bug detection tools. However, the majority of the clone detection literature is confined to a single language. With the increasing prevalence of cross-platform applications, functionality replication across multiple languages is common, resulting in code fragments having similar functionality but belonging to different languages. Since such clones are syntactically unrelated, single language clone detection tools are not applicable in their case. In this article, we propose a semi-supervised deep learning-based tool Rubhus, capable of detecting clones across different programming languages. Rubhus uses the control and data flow enriched abstract syntax trees (ASTs) of code fragments to leverage their syntactic and structural information and then applies graph neural networks (GNNs) to extract this information for the task of clone detection. We demonstrate the effectiveness of our proposed system through experiments conducted over datasets consisting of Java, C, and Python programs and evaluate its performance in terms of precision, recall, and F1 score. Our results indicate that Rubhus outperforms the state-of-the-art cross-language clone detection tools.
More
Translated text
Key words
Codes,Cloning,Syntactics,Semantics,Java,Task analysis,Source coding,Program representation learning,cross-language code clone detection,graph-based neural networks,abstract syntax trees
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined