Chrome Extension
WeChat Mini Program
Use on ChatGLM

REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark

CoRR(2024)

Cited 0|Views0
No score
Abstract
The ability of CodeLLMs to generate executable and functionally correct code at the repository-level scale remains largely unexplored. We introduce , a novel benchmark for evaluating code generation at the repository-level scale, emphasizing executability and correctness. provides an automated system that verifies requirements and incorporates a mechanism for dynamically generating high-coverage test cases to assess the functionality of generated code. Our work explores a controlled scenario where developers specify necessary code dependencies, challenging the model to integrate these accurately. Experiments show that while pretrained LLMs outperform instruction-tuning models in correctness, the latter excel in utilizing provided dependencies and demonstrating debugging capabilities. aims to provide a comprehensive evaluation of code functionality and alignment with developer intent, paving the way for more reliable and applicable CodeLLMs in real-world scenarios.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined