RABID: A Distributed Parallel R for Large Datasets

BigData Congress(2014)

引用 5|浏览29
暂无评分
摘要
Large-scale data mining and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have a large user base. R is one of the most widely used of these languages, but is limited to a single threaded execution model and problem sizes that fit in a single node. This paper describes highly parallel R system called RABID (R Analytics for BIg Data) that maintains R compatibility, leverages the MapReducelike distributed Spark and achieves high performance and scaling across clusters. Our experimental evaluation shows that RABID performs up to 5x faster than Hadoop and 20x faster than RHIPE on two data mining applications.
更多
查看译文
关键词
distributed parallel r,large datasets,statistical languages,rabid,statistical analysis,data analysis,single threaded execution model,big data analytics,mapreducelike distributed spark,distributed computing, big data analytics, r, data mining,distributed computing,data mining,enterprise applications,r,distributed processing,scientific applications
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要