DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
CoRR(2024)
摘要
In this work, we investigate the potential of large language models (LLMs)
based agents to automate data science tasks, with the goal of comprehending
task requirements, then building and training the best-fit machine learning
models. Despite their widespread success, existing LLM agents are hindered by
generating unreasonable experiment plans within this scenario. To this end, we
present DS-Agent, a novel automatic framework that harnesses LLM agent and
case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR
framework to structure an automatic iteration pipeline, which can flexibly
capitalize on the expert knowledge from Kaggle, and facilitate consistent
performance improvement through the feedback mechanism. Moreover, DS-Agent
implements a low-resource deployment stage with a simplified CBR paradigm to
adapt past successful solutions from the development stage for direct code
generation, significantly reducing the demand on foundational capabilities of
LLMs. Empirically, DS-Agent with GPT-4 achieves an unprecedented 100
rate in the development stage, while attaining 36
pass rate across alternative LLMs in the deployment stage. In both stages,
DS-Agent achieves the best rank in performance, costing $1.60 and $0.13 per
run with GPT-4, respectively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要