DATALAB: A Platform for Data Analysis and Intervention

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS(2022)

引用 12|浏览194
暂无评分
摘要
Despite data's crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data. In this paper, we propose DATALAB, a unified data-oriented platform that not only allows users to interactively analyze the characteristics of data, but also provides a standardized interface for different data processing operations. Additionally, in view of the ongoing proliferation of datasets, DATALAB has features for dataset recommendation and global vision analysis that help researchers form a better view of the data ecosystem. So far, DATALAB covers 1,715 datasets and 3,583 of its transformed version (e.g., hyponyms replacement), where 728 datasets support various analyses (e.g., with respect to gender bias) with the help of 140M samples annotated by 318 feature functions.(1) DATALAB is under active development and has been recently upgraded based on reviewers' constructive suggestions.(2) We have released a wealth of resources to meet the diverse needs of researchers: web platform,(3) open-sourced code of web platform,(4) web API, open-sourced SDK,(5) PyPI published package,(6) and online documentation.(7)
更多
查看译文
关键词
datalab analysis,intervention,platform
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要