Compiling queries for high-performance computing

semanticscholar（2016）

引用 2|浏览0

暂无评分

摘要

Data-intensive applications motivate the integration of highproductivity query languages with high-performance computing runtimes. We present a technique Compiled parallel pipelines (CPP) for compiling relational query plans to programs suitable for high-performance computing platforms. Rather than compose a sequential query compiler with a high-performance communication library like MPI, we take a holistic approach that leverages the capabilities of parallel languages. For each pipeline in the query plan, CPP generates a parallel partitioned global address space (PGAS) program. This approach affords modular design, and it allows the compiler to reason about whole pipelines that include parallelism and communication. Using PGAS to efficiently execute queries requires designing efficient shared data structures, generating code that avoids extra messages, and mitigating the overhead of an execution model based on fine-grained tasks. We implement our technique as a system called RADISH. Our evaluation shows that CPP is 5.5× faster than compiled iterators on TPC-H queries. To show that RADISH is a practical system for in-memory analytics, we also compare the performance of RADISH on TPC-H with the MPP system DBX and find it to be competitive. Our work takes important first steps integrating query processing and distributed HPC.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要