Pipegen: Data Pipe Generator For Hybrid Analytics
SoCC '16: ACM Symposium on Cloud Computing Santa Clara CA USA October, 2016(2016)
摘要
As the number of big data management systems continues to grow, users increasingly seek to leverage multiple systems in the context of a single data analysis task. To efficiently support such hybrid analytics, we develop a tool called PipeGen for efficient data transfer between database management systems (DBMSs). PipeGen automatically generates data pipes between DBMSs by leveraging their functionality to transfer data via disk files using common data formats such as CSV. PipeGen creates data pipes by extending such functionality with efficient binary data transfer capabilities that avoid file system materialization, include multiple important format optimizations, and transfer data in parallel when possible. We evaluate our PipeGen prototype by generating 20 data pipes automatically between five different DBMSs. The results show that PipeGen speeds up data transfer by up to 3.8 x as compared to transferring using disk files.
更多查看译文
关键词
Hybrid analytics,heterogeneous data transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络