LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration
CoRR(2024)
摘要
Data exploration is a challenging process in which users examine a dataset by
iteratively employing a series of queries. While in some cases the user
explores a new dataset to become familiar with it, more often, the exploration
process is conducted with a specific analysis goal or question in mind. To
assist users in exploring a new dataset, Automated Data Exploration (ADE)
systems have been devised in previous work. These systems aim to auto-generate
a full exploration session, containing a sequence of queries that showcase
interesting elements of the data. However, existing ADE systems are often
constrained by a predefined objective function, thus always generating the same
session for a given dataset. Therefore, their effectiveness in goal-oriented
exploration, in which users need to answer specific questions about the data,
are extremely limited.
To this end, this paper presents LINX, a generative system augmented with a
natural language interface for goal-oriented ADE. Given an input dataset and an
analytical goal described in natural language, LINX generates a personalized
exploratory session that is relevant to the user's goal. LINX utilizes a Large
Language Model (LLM) to interpret the input analysis goal, and then derive a
set of specifications for the desired output exploration session. These
specifications are then transferred to a novel, modular ADE engine based on
Constrained Deep Reinforcement Learning (CDRL), which can adapt its output
according to the specified instructions.
To validate LINX's effectiveness, we introduce a new benchmark dataset for
goal-oriented exploration and conduct an extensive user study. Our analysis
underscores LINX's superior capability in producing exploratory notebooks that
are significantly more relevant and beneficial than those generated by existing
solutions, including ChatGPT, goal-agnostic ADE, and commercial systems.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要