Towards Building A Scalable Data Analytics System On Clouds: An Early Experience On Alicloud
PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD)(2018)
摘要
With the development of big data, big data processing systems, such as Hadoop and Spark, are widely used to handle large-scale data. To avoid the complexity and expensiveness of building a self-owned big data processing system, cloud providers tend to deploy big data processing tools as cloud services. Typical examples include Amazon EMR, Azure HDInsight and AliCloud E-MapReduce. However, how to build a cost-efficient system and scale the system is still challenging. In this paper, we have conducted a case study on AliCloud E-MapReduce, and analyzed the system performance upon local and remote file systems. We compared the scalability of Hadoop and Spark by using scaleout and scale-up strategies respectively. Based on the analysis results, we derive several observations and implications, which will contribute to guide the performance optimization.
更多查看译文
关键词
scalability evaluation, cloud-based data processing, SaaS
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要