Monetary Cost Optimizations for HPC Applications on Amazon Clouds : Checkpoints and Replicated Execution

semanticscholar(2014)

引用 1|浏览1
暂无评分
摘要
I. MOTIVATION Recently, we have witnessed that many emerging high performance computing (HPC) or scientific computing applications are developed and hosted in the cloud. As those applications are usually long running jobs and are costly in the cloud, monetary cost [11], [7] and performance [3], [2] are important optimization factors. Message Passing Interface (MPI) is the key programming paradigm for developing HPC and scientific applications. That motivates us to investigate whether and how we can reduce the monetary cost for MPIbased applications with performance constraint in the cloud. Cloud has evolved into an economic market. Besides ondemand instances that charges users at a fixed rate, Amazon EC2 provides spot instances, whose prices are mainly determined by the supply and demand in the market. Table I shows the statistics of the price history of four types of spot and on demand instances on Amazon in the US East region during August 2013. We have the following observations: a) Spot instances are usually much cheaper than on-demand instances. There are some “outlier” points where the maximum price is much higher than the on-demand price. If spot instances are leveraged properly, they can reduce monetary cost [10], in comparison with the solutions with on-demand only. b) Different instance types have different variations on the price. These observations are consistent with the previous studies [6]. Leveraging spot instance is an ideal approach to reduce the monetary cost of MPI executions. However, a spot instance can be terminated whenever the spot price is higher than the bidding price (i.e., an out-of-bid event). We have observed that the spot price is highly dynamic in both spatial and temporal dimensions. For spatial dynamics, clouds (e.g., different Amazon EC2 zones) have very different spot prices. For temporal dynamics, spot prices can be rather stable for some times, and be changing dramatically for other times. Due to the spot price dynamics, failures can occur in MPI executions. In order to satisfy the performance requirement (usually in the form of deadlines), fault tolerant executions are necessary. In this paper, we investigate two common fault-tolerant mechanisms of MPI, including checkpointing and replicated execution. These two mechanisms are actually complementary with each other. Checkpointing can reduce the execution time when the failure occurs and replicated execution can reduce the failure rate in spot market. When the spot price is stable, checkpointing is not necessary. When the spot price varies sharply, checkpointing technique becomes more useful. TABLE I. STATISTICS ON SPOT PRICES ($/HOUR, AUGUST 2013, US EAST REGION) AND ON-DEMAND PRICES OF AMAZON EC2.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要