RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data

IEEE INFOCOM 2020 - IEEE Conference on Computer Communications(2020)

引用 3|浏览19
暂无评分
摘要
Cluster caching systems increasingly store structured data objects in the columnar format. However, these systems routinely face the imbalanced load that significantly impairs the I/O performance. Existing load-balancing solutions, while effective for reading unstructured data objects, fall short in handling columnar data. Unlike unstructured data that can only be read through a full-object scan, columnar data supports direct query of specific columns with two distinct access patterns: (1) columns have the heavily skewed popularity, and (2) hot columns are likely accessed together in a query job. Based on these two access patterns, we propose an effective load-balancing solution for structured data. Our solution, which we call RepBun, groups hot columns into a bundle. It then copies multiple replicas of the column bundle and stores them uniformly across servers. We show that RepBun achieves improved load balancing with reduced memory overhead, while avoiding data shuffling between cache servers. We implemented RepBun atop Alluxio, a popular in-memory distributed storage, and evaluate its performance through EC2 deployment against the TPC-H benchmark work-load. Experimental results show that RepBun outperforms the existing load-balancing solutions with significantly shorter read latency and faster query completion.
更多
查看译文
关键词
groups hot columns,column bundle,RepBun,load balancing,cache servers,TPC-H benchmark work-load,existing load-balancing solutions,load-balanced,structured data,cluster caching systems,columnar format,imbalanced load,unstructured data objects,columnar data,direct query,specific columns,distinct access patterns,heavily skewed popularity,effective load-balancing solution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要