Data-driven Discovery with Large Generative Models
CoRR(2024)
摘要
With the accumulation of data at an unprecedented rate, its potential to fuel
scientific discovery is growing exponentially. This position paper urges the
Machine Learning (ML) community to exploit the capabilities of large generative
models (LGMs) to develop automated systems for end-to-end data-driven discovery
– a paradigm encompassing the search and verification of hypotheses purely
from a set of provided datasets, without the need for additional data
collection or physical experiments. We first outline several desiderata for an
ideal data-driven discovery system. Then, through DATAVOYAGER, a
proof-of-concept utilizing GPT-4, we demonstrate how LGMs fulfill several of
these desiderata – a feat previously unattainable – while also highlighting
important limitations in the current system that open up opportunities for
novel ML research. We contend that achieving accurate, reliable, and robust
end-to-end discovery systems solely through the current capabilities of LGMs is
challenging. We instead advocate for fail-proof tool integration, along with
active user moderation through feedback mechanisms, to foster data-driven
scientific discoveries with efficiency and reproducibility.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要