Encoding Version History Context for Better Code Representation
CoRR(2024)
摘要
With the exponential growth of AI tools that generate source code,
understanding software has become crucial. When developers comprehend a
program, they may refer to additional contexts to look for information, e.g.
program documentation or historical code versions. Therefore, we argue that
encoding this additional contextual information could also benefit code
representation for deep learning. Recent papers incorporate contextual data
(e.g. call hierarchy) into vector representation to address program
comprehension problems. This motivates further studies to explore additional
contexts, such as version history, to enhance models' understanding of
programs. That is, insights from version history enable recognition of patterns
in code evolution over time, recurring issues, and the effectiveness of past
solutions. Our paper presents preliminary evidence of the potential benefit of
encoding contextual information from the version history to predict code clones
and perform code classification. We experiment with two representative deep
learning models, ASTNN and CodeBERT, to investigate whether combining
additional contexts with different aggregations may benefit downstream
activities. The experimental result affirms the positive impact of combining
version history into source code representation in all scenarios; however, to
ensure the technique performs consistently, we need to conduct a holistic
investigation on a larger code base using different combinations of contexts,
aggregation, and models. Therefore, we propose a research agenda aimed at
exploring various aspects of encoding additional context to improve code
representation and its optimal utilisation in specific situations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要