Combining Code Context and Fine-grained Code Difference for Commit Message Generation

Asia-Pacific Symposium on Internetware (Internetware)(2022)

Cited 2|Views28
No score
Abstract
Generating natural language messages for source code changes is an essential task in software development and maintenance. Existing solutions mainly treat a piece of code difference as natural language, and adopt seq2seq learning to translate it into a commit message. The basic assumption of such solutions lies in the naturalness hypothesis, i.e., source code written by programming languages is to some extent similar to natural language text. However, compared with natural language, source code also bears syntactic regularities. In this paper, we propose to simultaneously model the naturalness and syntactic regularities of source code changes for commit message generation. Specifically, to model syntactic regularities, we first enlarge the input with additional context information, i.e., the code statements that have dependency with the variables in the code difference, and then extract the paths in the corresponding ASTs. Moreover, to better model code difference, we align the two versions of code before and after the committed code change at token level, and annotate their differences with fine-grained edit operations. The context and difference are simultaneously encoded in a learning framework to generate the commit messages. We collected from GitHub a large dataset containing 480 Java projects with over 160k commits, and the experimental results demonstrate the effectiveness of the proposed approach.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined