Repoformer: Selective Retrieval for Repository-Level Code Completion
arxiv(2024)
摘要
Recent advances in retrieval-augmented generation (RAG) have initiated a new
era in repository-level code completion. However, the invariable use of
retrieval in existing methods exposes issues in both efficiency and robustness,
with a large proportion of the retrieved contexts proving unhelpful or harmful
to code language models (code LMs). To tackle the challenges, this paper
proposes a selective RAG framework where retrieval is avoided when unnecessary.
To power this framework, we design a self-supervised learning approach that
enables a code LM to accurately self-evaluate whether retrieval can improve its
output quality and robustly leverage the potentially noisy retrieved contexts.
Using this LM as both the selective retrieval policy and the generation model,
our framework consistently outperforms the state-of-the-art prompting with an
invariable retrieval approach on diverse benchmarks including RepoEval,
CrossCodeEval, and a new benchmark. Meanwhile, our selective retrieval strategy
results in strong efficiency improvements by as much as 70
without harming the performance. We demonstrate that our framework effectively
accommodates different generation models, retrievers, and programming
languages. These advancements position our framework as an important step
towards more accurate and efficient repository-level code completion.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要