TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines

SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020(2020)

引用 31|浏览91
暂无评分
摘要
Machine Learning (ML) research has primarily focused on improving the accuracy and efficiency of the training algorithms while paying much less attention to the equally important problem of understanding, validating, and monitoring the data fed to ML. Irrespective of the ML algorithms used, data errors can adversely affect the quality of the generated model. This indicates that we need to adopt a data-centric approach to ML that treats data as a first-class citizen, on par with algorithms and infrastructure which are the typical building blocks of ML pipelines. In this demonstration we showcase TensorFlow Data Validation (TFDV), a scalable data analysis and validation system for ML that we have developed at Google and recently open-sourced. This system is deployed in production as an integral part of TFX - an end-to-end machine learning platform at Google. It is used by hundreds of product teams at Google and has received significant attention from the open-source community as well.
更多
查看译文
关键词
data management, machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要