HPCI: A Perl module for writing cluster-portable bioinformatics pipelines

bioRxiv(2018)

引用 0|浏览5
暂无评分
摘要
Background: Most biocomputing pipelines are run on clusters of computers. Each type of cluster has its own API (application programming interface). That API defines how a program that is to run on the cluster must request the submission, content and monitoring of jobs to be run on the cluster. Sometimes, it is desirable to run the same pipeline on different types of cluster. This can happen in situations including when: - different labs are collaborating, but they do not use the same type of cluster; - a pipeline is released to other labs as open source or commercial software; - a lab has access to multiple types of cluster, and wants to choose between them for scaling, cost or other purposes; - a lab is migrating their infrastructure from one cluster type to another; - during testing or travelling, it is often desired to run on a single computer. However, since each type of cluster has its own API, code that runs jobs on one type of cluster needs to be re-written if it is desired to run that application on a different type of cluster. To resolve this problem, we created a software module to generalize the submission of pipelines across computing environments, including local compute, clouds and clusters.Results: HPCI (High Performance Computing Interface) is a Perl module that provides the interface to a standardized generic cluster. When the HPCI module is used, it accepts a parameter to specify the cluster type. The HPCI module uses this to load a driver HPCD:: . This is used to translate the abstract HPCI interface to the specific software interface. Simply by changing the cluster parameter, the same pipeline can be run on a different type of cluster with no other changes.Conclusion: The HPCI module assists in writing Perl programs that can be run in different lab environments, with different site configuration requirements and different types of hardware clusters. Rather than having to re-write portions of the program, it is only necessary to change a configuration file. Using HPCI, an application can manage collections of jobs to be runs, specify ordering dependencies, detect success or failure of jobs run and allow automatic retry of failed jobs (allowing for the possibility of a changed configuration such as when the original attempt specified an inadequate memory allotment).Keywords: portability; cluster; environment; pipeline
更多
查看译文
关键词
portability,cluster,environment,pipeline
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要