DataHub: Collaborative Data Science & Dataset Version Management at Scale.

CIDR(2015)

引用 205|浏览137
暂无评分
摘要
Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.
更多
查看译文
关键词
dataset version management,collaborative datahub science,datahub science
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要