Compilation of SQL Queries for Efficient Distributed In-Memory Processing.

Sudip Chatterjee, Shubh Sharma, Nithin Ivan, Saumya Verma,Suprio Ray,Mark Stoodley,Calisto Zuzarte, Ian Finlay

CASCON '23: Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering(2023)

引用 0|浏览2
暂无评分
摘要
A query processing engine is the core component of any modern database system. There are several types of query processing en-gines that employ different query processing techniques. The speed of data-driven decision-making and analytics is crucial to organi-zations that build software and system applications. An intuitive way to speed up database querying is to improve the performance of these engines. Conventionally, databases use a disk-oriented, pull-based or tuple-at-a-time interpreted query evaluation model. This paper introduces a compilation-based, in-memory query pro-cessing engine CasaDB that accepts an SQL query and generates distributed C++ (UPC++) based physical query plans. As part of this work, different models and components of query processing are explored, and efficient Partitioned Global Address Space (PGAS) based parallel programs corresponding to SQL queries are designed and developed, emitted by a code generator that uses a data-centric compilation strategy. The approach proposed in this paper com-bines high-performance parallel programs with database query processing to take advantage of the advances in hardware available. We conduct an extensive experimental evaluation with industry-standard TPC-H benchmark. Our experimental evaluation shows that 4-node query execution produces up to 5× speedup in query performance over single-node approaches.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要