谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Kubeflow-based Automatic Data Processing Service for Data Center of State Grid Scenario

19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021)(2021)

引用 1|浏览3
暂无评分
摘要
With the rapid development of machine learning and deep learning, more and more machine learning and deep learning have appeared in the power grid business. The data processing in the State Grid business is very complicated, the data processing is very cumbersome, and the reuse rate of the data processing code is also very low. In order to solve these problems, this paper proposes an efficient automated data processing service-EADP (Efficient Automated Data Processing). EADP service is built on Kubeflow. Kubeflow/Pipeline is Google's open source workflow for building end-to-end services. Users can build the code as Pipeline/Component for use by Kubeflow/Pipeline. But Kubeflow's Component and Pipeline construction is extremely cumbersome and lacks management of Component and Pipeline. In order to solve these problems, EADP provides the function of automatically constructing Component and data processing DAG. There is a one-to-one correspondence between Component and Docker/Image. Docker/Image contains code blocks for data processing, which can be run after instantiation. The data processing flow can be constructed as a data processing DAG. The data processing DAG is composed of Component, and each node in the DAG corresponds to a Component. EADP uses a topological sorting algorithm to convert the data processing DAG into Kubeflow/Pipeline, thereby realizing automated data processing. On the surface of the experiment, EADP has high stability and convenience, which can greatly shorten the time-consuming data processing.
更多
查看译文
关键词
Pipeline technology,Component technology,Topological sorting,Docker/Image
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要