OpenVenus: An Open Service Interface for HPC Environment Based on SLURM.

SmartCom(2022)

引用 0|浏览4
暂无评分
摘要
With the emergence of more and more “AI + Field + HPC” applications, it is urgent to solve the problem of scheduling and management of High-Performance Computing (HPC) resources, as well as the fast and efficient “cloud service” of HPC applications. This engineering problem is particularly critical because it affects the progress of scientific research, the development period of the research platform, and the learning cost of scientists. To solve the problem, a set of reusable life cycle processes for HPC resources are designed. Based on the life cycle, we propose an open service interface based on HPC, which reduces the startup time under multiple refreshes and abnormal retries by using the mode of contention lock. The active interruption of users is a typical scenario in the startup phase. Furthermore, a read-write strategy with an overlay based on Singularity is implemented to save storage space and improve running speed. In order to evaluate the serviceability and performance of the proposed interface, we deploy the service on the Venus platform and make a startup comparison experiment. In addition, the reduction of storage for 100 users is also tested. The experimental results show that under the HPC environment with SLURM, the proposed open-service interface can effectively shorten 46% startup time of applications and services and reduce 25% storage at least for each user of the Venus platform.
更多
查看译文
关键词
Open service, HPC, Cloud, Container, Singularity, Cloud application
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要