VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters

CloudCom), 2013 IEEE 5th International Conference(2013)

引用 9|浏览0
暂无评分
摘要
Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.
更多
查看译文
关键词
volume,disk resources,large-scale in-memory computation,mapreduce,cloud computing technology,distributed file system,system-wide data substrate,stores data persistently,enable large-scale in-memory computation,storage management,global name,k-means clustering,physical memory,remote servers,distributed dram,virtual on-line unified memory environment,general memory,file system,hadoop-hdfs,data storage,data center environment,distributed virtual memory,volume scale,general memory based abstraction,spark-rdd,cloud computing,cloud computing system,distributed processing,th-1-gz supercomputer,recent system,commodity clusters,data sharing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要