Multi-lingual Evaluation of Code Generation Models

ICLR 2023(2022)

引用 19|浏览93
暂无评分
摘要
We present MBXP, an execution-based code completion benchmark in 10+ programming languages. This collection of datasets is generated by our conversion framework that translates prompts and test cases from the original MBPP dataset to the corresponding data in a target language. Based on this benchmark, we are able to evaluate code generation models in a multi-lingual fashion, and in particular discover generalization ability of language models on out-of-domain languages, advantages of large multi-lingual models over mono-lingual, benefits of few-shot prompting, and zero-shot translation abilities. In addition, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages. These solutions can be used for other code-related evaluations such as insertion-based, summarization, or code translation tasks where we demonstrate results and release as part of our benchmark.
更多
查看译文
关键词
code generation,execution-based evaluation,test-based evaluation,language models,multi-lingual code generation benchmark,code insertion,code summarization,robustness for code,code translation,zero-shot code translation,multi-lingual,mono-lingual,language models.
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要