Non-Autoregressive Line-Level Code Completion

ACM Transactions on Software Engineering and Methodology(2023)

引用 0|浏览9
暂无评分
摘要
Software developers frequently use code completion tools to accelerate software development by suggesting the following code elements. Researchers usually employ AutoRegressive (AR) decoders to complete code sequences in a left-to-right, token-by-token fashion. To improve the accuracy and efficiency of code completion, we argue that tokens within a code statement have the potential to be predicted concurrently. In this paper, we first conduct an empirical study to analyze the dependency among the target tokens in line-level code completion. The results suggest that it is potentially practical to generate all statement tokens in parallel. To this end, we introduce SANAR, a simple and effective syntax-aware non-autoregressive model for line-level code completion. To further improve the quality of the generated code, we propose an adaptive and syntax-aware sampling strategy to boost the model’s performance. The experimental results obtained from two widely used datasets indicate that our model outperforms state-of-the-art code completion approaches of similar model size by a considerable margin, and is faster than these models with up to 9 × speed-up. Moreover, the extensive results additionally demonstrate that the enhancements achieved by SANAR become even more pronounced with larger model sizes, highlighting their significance.
更多
查看译文
关键词
code completion,neural networks,non-autoregressive generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要