Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science
CoRR(2024)
摘要
In the domain of data science, the predictive tasks of classification,
regression, and imputation of missing values are commonly encountered
challenges associated with tabular data. This research endeavors to apply Large
Language Models (LLMs) towards addressing these predictive tasks. Despite their
proficiency in comprehending natural language, LLMs fall short in dealing with
structured tabular data. This limitation stems from their lacking exposure to
the intricacies of tabular data during their foundational training. Our
research aims to mitigate this gap by compiling a comprehensive corpus of
tables annotated with instructions and executing large-scale training of
Llama-2 on this enriched dataset. Furthermore, we investigate the practical
application of applying the trained model to zero-shot prediction, few-shot
prediction, and in-context learning scenarios. Through extensive experiments,
our methodology has shown significant improvements over existing benchmarks.
These advancements highlight the efficacy of tailoring LLM training to solve
table-related problems in data science, thereby establishing a new benchmark in
the utilization of LLMs for enhancing tabular intelligence.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要