Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation
CoRR(2023)
Abstract
Recently, the large language models (LLMs) have shown extraordinary ability
in understanding natural language and generating programming code. It has been
a common practice of software engineers to consult LLMs when encountering
coding questions. Although efforts have been made to avoid syntax errors and
align the code with the intended semantics, the reliability and robustness of
the code generationfrom LLMs have not yet been thoroughly studied. The
executable code is not equivalent to the reliable and robust code, especially
in the context of real-world software development. The misuse of APIs in the
generated code could lead to severe problem, such as resource leaks, program
crashes. To make things worse, the users of LLM code generation services are
actually the developers that are most vulnerable to these code that seems right
– They are always novice developers that are not familiar with the APIs that
LLMs generate code for them. Therefore, they could hardly tell the misuse in
the code generated by LLMs, which further facilitates the incorrect code
applied in real-world software. Existing code evaluation benchmark and datasets
focus on crafting small tasks such as programming questions in coding
interviews, which however deviates from the problem that developers would ask
LLM for real-world coding help. To fill the missing piece, in this work, we
propose a dataset RobustAPI for evaluating the reliability and robustness of
code generated by LLMs. We collect 1208 coding questions from StackOverflow on
24 representative Java APIs. We summarize thecommon misuse patterns of these
APIs and evaluate them oncurrent popular LLMs. The evaluation results show that
evenfor GPT-4, 62
unexpected consequences if the code isintroduced into real-world software.
MoreTranslated text
Key words
large language model code,robustness
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined