Exploring Benefits of NVMe SSDs for BigData Processing in Enterprise Data Centers

2019 5th International Conference on Big Data Computing and Communications (BIGCOM)(2019)

引用 7|浏览30
暂无评分
摘要
Big data processing environments such as Apache Spark are prominently deployed for applications with large scale workloads. New storage technologies such as Non-Volatile Memory Express Solid State Drives (NVMe SSDs) provide higher throughput comparing to the traditional Hard Disk Drives (HDDs). Therefore, NVMe SSDs are rapidly substituting HDDs in modern data centers. In this paper, we explore whether it is critically necessary to use NVMe SSD for a large workload running on the Spark big data framework. Specifically, we investigate what are the influential factors of application design and Spark data processing framework to exploit the benefits of NVMe SSDs. Our real experimental results reveal that some applications even with large workloads cannot fully utilize NVMe SSDs to obtain high I/O throughput. Interestingly, we find out that characteristics of Spark data processing framework such as shuffling (i.e., the volume of transition data generated by an application), and parallelism (i.e., the number of concurrently running tasks) has very crucial impacts on the performance of big data applications running on NVMe SSDs.
更多
查看译文
关键词
NVMe SSD,Throughput,Spark,I/O Access Pattern,Data Processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要