EffiBench: Benchmarking the Efficiency of Automatically Generated Code
CoRR(2024)
摘要
Code generation models have increasingly become integral to aiding software
development, offering assistance in tasks such as code completion, debugging,
and code translation. Although current research has thoroughly examined the
correctness of code produced by code generation models, a vital aspect, i.e.,
the efficiency of the generated code, has often been neglected. This paper
presents EffiBench, a benchmark with 1,000 efficiency-critical coding problems
for assessing the efficiency of code generated by code generation models.
EffiBench contains a diverse set of LeetCode coding problems. Each problem is
paired with an executable human-written canonical solution. With EffiBench, we
empirically examine the capability of 21 Large Language Models (13 open-sourced
and 8 closed-sourced) in generating efficient code. The results demonstrate
that GPT-4-turbo generates the most efficient code, significantly outperforming
Palm-2-chat-bison, Claude-instant-1, Gemini-pro, GPT-4, and GPT-3.5.
Nevertheless, its code efficiency is still worse than the efficiency of
human-written canonical solutions. In particular, the average and worst
execution time of GPT-4-turbo generated code is 1.69 and 45.49 times that of
the canonical solutions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要