Solving Data-centric Tasks using Large Language Models
CoRR(2024)
摘要
Large language models (LLMs) are rapidly replacing help forums like
StackOverflow, and are especially helpful for non-professional programmers and
end users. These users are often interested in data-centric tasks, such as
spreadsheet manipulation and data wrangling, which are hard to solve if the
intent is only communicated using a natural-language description, without
including the data. But how do we decide how much data and which data to
include in the prompt? This paper makes two contributions towards answering
this question. First, we create a dataset of real-world NL-to-code tasks
manipulating tabular data, mined from StackOverflow posts. Second, we introduce
a cluster-then-select prompting technique, which adds the most representative
rows from the input data to the LLM prompt. Our experiments show that LLM
performance is indeed sensitive to the amount of data passed in the prompt, and
that for tasks with a lot of syntactic variation in the input table, our
cluster-then-select technique outperforms a random selection baseline.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要