A Case Study in Using Discrete-Event Simulation to Improve the Scalability of MG-RAST.

SIGSIM-PADS '16: SIGSIM Principles of Advanced Discrete Simulation Banff Alberta Canada May, 2016(2016)

引用 0|浏览42
暂无评分
摘要
As the cost of DNA sequencing has decreased, computational biology data processing platforms are experiencing an increasingly large volume of data analysis requests. The metagenomics analysis server MG-RAST at Argonne National Laboratory, a computational biology data processing platform, is receiving several terabytes of data submissions per month. However, MG-RAST currently relies on a central object-based data store, Shock, for data access and storage that can become a bottleneck under high data transfer loads, adversely affecting the job response time for end users. In this work, we use a discrete-event simulation approach to explore the use of data proxies and an enhanced, proxy-aware scheduling methodology designed to reduce the movement of the intermediate data generated during workflow processing. In this approach, Shock is supplemented with proxy storage servers, employing solid state drives, to decentralize the management and hence reduce the movement of intermediate workflow results. Discrete-event simulation provides a way to evaluate the performance of MG-RAST with increased workloads without disrupting the production system. For our case study, we extrapolate scientific workflows obtained from MG-RAST to represent future usage trends. We demonstrate that the addition of proxies and the proxy-aware scheduling methodology significantly reduces the data movement overhead by distributing the data plane, leading to substantial improvement in end-user job response time.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要