Enabling Transparent Acceleration of Big Data Frameworks using Heterogeneous Hardware.

Very Large Data Bases Conference (VLDB)(2022)

引用 5|浏览9
暂无评分
摘要
The ever-increasing demand for high performance Big Data analytics and data processing, has paved the way for heterogeneous hardware accelerators, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to be integrated into modern Big Data platforms. Currently, this integration comes at the cost of programmability since the end-user Application Programming Interface (APIs) must be altered to access the underlying heterogeneous hardware. For example, current Big Data frameworks, such as Apache Spark, provide a new API that combines the existing Spark programming model with GPUs. For other Big Data frameworks, such as Flink, the integration of GPUs and FPGAs is achieved via external API calls that bypass their execution models completely. In this paper, we rethink current Big Data frameworks from a systems and programming language perspective, and introduce a novel co-designed approach for integrating hardware acceleration into their execution models. The novelty of our approach is attributed to two key design decisions: a) support for arbitrary User Defined Functions (UDFs), and b) no modifications to the user level API. The proposed approach has been prototyped in the context of Apache Flink, and enables unmodified applications written in Java to run on heterogeneous hardware, such as GPU and FPGAs, transparently to the users. The performance evaluation of the proposed solution has shown performance speedups of up to 65x on GPUs and 184x on FPGAs for suitable workloads of standard benchmarks and industrial use cases against vanilla Flink running on traditional multi-core CPUs.
更多
查看译文
关键词
big data frameworks,transparent acceleration,big data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要