E = Mc3: Managing Uncertain Enterprise Data In A Cluster-Computing Environment

MOD(2009)

引用 22|浏览57
暂无评分
摘要
Modern enterprises must manage uncertain data for purposes of risk assessment and decisionmaking under uncertainty. The Monte Carlo approach embodied in the MCDB system of Jampani et al. is well suited for such a task. MCDB can support industrial strength business-intelligence queries over uncertain warehouse data. Moreover, MCDB's extensible approach to specifying uncertainty can also capture complex stochastic prediction models, allowing sophisticated "what-if" analyses within the DBMS. The MCDB computations can be highly CPU intensive, but offer the potential for massive parallelization. To realize this potential, we provide a new system, called MC3 (Monte Carlo Computation on a Cluster), that extends the MCDB approach to the map-reduce processing framework. MC3 can exploit the robustness and scalability of map-reduce, and can handle data stored in non-relational formats. We show how MCDB query plans over "triple bundles" can be translated to sequences of map-reduce operations over nested data, and describe different parallelization schemes. We also provide and analyze several novel distributed algorithms for adding pseudorandom number seeds to tuple bundles. These algorithms ensure statistical correctness of the Monte-Carlo computations while minimizing the seed length. Our experiments show that MC3 can scale well for a variety of workloads.
更多
查看译文
关键词
Uncertain Data,Map-Reduce,Monte Carlo,JSON,JAQL
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要