Smart Streaming: A High-Throughput Fault-tolerant Online Processing System

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2020)

引用 2|浏览8
暂无评分
摘要
In recent years, there has been considerable interest in developing frameworks for processing streaming data. Like the precursor commercial systems for data-intensive processing, these systems have largely not used methods popular within the HPC community (for example, MPI for communication). In this paper, we demonstrate a system for stream processing that offers a high-level API to the users (similar to MapReduce), is fault-tolerant, and is also more efficient and scalable than current solutions. Particularly, a cost-efficient MPI/OpenMP based fault-tolerant scheme is incorporated so that the system can survive node failures with only a modest degradation of performance. We evaluate both the functionality and efficiency of Smart Streaming using four common applications in machine learning and data analytics. A comparison against state-of-the-art streaming frameworks shows our system boosts the throughput of test cases by up to 10X and achieve desirable parallelism when scaled out. Additionally, the performance loss upon failures is only proportional to the share of failed resources.
更多
查看译文
关键词
streaming data processing,precursor commercial systems,data-intensive processing,stream processing,high-level API,data analytics,high-throughput fault-tolerant online processing system,smart streaming,MPI,OpenMP based fault-tolerant scheme,machine learning,parallelism,resource sharing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要