Multi-lingual Evaluation of Code Generation Models

ICLR 2023（2022）

引用 19|浏览93

暂无评分

摘要

We present MBXP, an execution-based code completion benchmark in 10+ programming languages. This collection of datasets is generated by our conversion framework that translates prompts and test cases from the original MBPP dataset to the corresponding data in a target language. Based on this benchmark, we are able to evaluate code generation models in a multi-lingual fashion, and in particular discover generalization ability of language models on out-of-domain languages, advantages of large multi-lingual models over mono-lingual, benefits of few-shot prompting, and zero-shot translation abilities. In addition, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages. These solutions can be used for other code-related evaluations such as insertion-based, summarization, or code translation tasks where we demonstrate results and release as part of our benchmark.

查看译文

关键词

code generation,execution-based evaluation,test-based evaluation,language models,multi-lingual code generation benchmark,code insertion,code summarization,robustness for code,code translation,zero-shot code translation,multi-lingual,mono-lingual,language models.

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要