QaaD (Query-as-a-Data): Scalable Execution of Massive Number of Small Queries in Spark.

Proc. ACM Manag. Data(2023)

引用 0|浏览18
暂无评分
摘要
Spark big data processing platform is heavily used in today's IT services for various critical applications such as machine learning tasks for service recommendations or massive volumes of raw sales data analysis. Spark is designed to deliver high performance by enabling a high degree of parallelism while processing various heavy-weight queries that require homogeneous operations on large data. However, it has been observed that workloads made of small and short-running queries coming from various sources are becoming dominant in practice. Unfortunately, the current Spark architecture is unfit to process workloads made of a large number of small queries optimally due to excessive I/Os with small computations. We present a technique, called QaaD, that addresses this problem fundamentally by applying i) transparent conversion of workloads made of small queries into one with large queries and ii) dynamic partition size adjustment for runtime overhead minimization. For this, we introduce a new abstraction, microRDD, to support our design of query merging, the embedding of queries as part of data, and an opportunistic sharing of common input data among queries. Comprehensive evaluation using real-world data shows that QaaD is able to deliver 10.6x to 36.6x speed-up against standard Spark executions for small query workloads.
更多
查看译文
关键词
scalable execution,small queries,qaad,query-as-a-data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要