Multi-node Big Data VM Platform and Job Submission Portlet

2017 5th Intl Conf on Applied Computing and Information Technology/4th Intl Conf on Computational Science/Intelligence and Applied Informatics/2nd Intl Conf on Big Data, Cloud Computing, Data Science (ACIT-CSII-BCD)(2017)

引用 1|浏览15
暂无评分
摘要
The present study utilizes VirtualBox virtual environment technology to develop the personal and compact size of multi-node big data VM platform with Spark and Hadoop cluster that can effectively replicate and provides an environment for developers to easily design and implement Spark and Hadoop Map/Reduce programming. By using the multi-node Hadoop VM system, developers can conduct Map/Reduce programing completely the same as that in the real multi-node Hadoop cluster. To demonstrate its capability and applicability, this study performs the benchmark by using the big data VM platform and a physical Multi-Node Hadoop Cluster. Based on the standard WordCount benchmarking, the computing time of the physical multi-node Hadoop cluster is 3.7 times faster than that of VM Hadoop cluster. The benchmark results show that the big data VM platform is an ideal platform for the portal and Map/Reduce programming, Spark programming and testing purposes, and the physical Hadoop cluster is the most appropriate for production runs. In addition, the big data VM platform contains a web portal development module designed to support applications that implement big data computing services for the engineering and science users. Such applications are inherently complex, potentially accessing data from a variety of sources and distributing applications to a variety of clients. This portal development module can act as multiple roles in many projects such as personal portals, small business portals, enterprise portals, educational portal, infrastructure portal, and other types of portals. Finally, the big data VM platform, in term of a big data development platform, is ready for users to download. The first author of this paper would like to give a demonstration for the proposed multi-node big data VM platform.
更多
查看译文
关键词
Big Data,Spark,Hadoop,Computation,MapReduce,Personal Platform,portal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要