Big Data Processing: Scalability with Extreme Single-Node Performance

2017 IEEE International Congress on Big Data (BigData Congress)(2017)

引用 3|浏览130
暂无评分
摘要
Contemporary frameworks for data analytics, such as Hadoop, Spark, and Flink seek to allow applications to scale performance flexibly by adding hardware nodes. However, we find that when the computation on each individual node is optimized, peripheral activities such as creating data partitions, messaging and synchronizing between nodes diminish the speedup obtainable from adding more hardware. We analyze workloads which distribute operations on correlated data-such as joins and aggregation found in SQL, text similarity searches, and image disparity computations. After optimizing computation on efficient, custom processors, we discover challenges in scaling the applications to hundreds of nodes on a high-bandwidth network. We then describe techniques to overcome these challenges towards prototyping a 512-node system which is able to execute SQL queries offloaded from a commercial database, and outperform SQL-on-hadoop and traditional parallel RDBMS executions by 173x and 7x respectively.
更多
查看译文
关键词
bigdata,scalability,dynamic network scheduling,shuffle
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要