Fault Tolerant Lanczos Eigensolver via an Invariant Checking Method

Felix Loh,Kewal K. Saluja,Parameswaran Ramanathan

JOURNAL OF ELECTRONIC TESTING-THEORY AND APPLICATIONS（2021）

引用 0|浏览6

暂无评分

摘要

An extensive survey of the literature shows that the Lanczos eigensolver is a popular iterative method for approximating a few maximal eigenvalues of a real symmetric matrix, particularly if the matrix is large and sparse. In recent years, graphics processing units (GPUs) have become a popular platform for scientific computing applications, many of which are based on linear algebra, and are increasingly being used as the main computational units in supercomputers. This trend is expected to continue as the number of computations required by scientific applications reach petascale and exascale range. In this paper, building on our earlier work [ 22 ], we investigate in detail the error checking mechanism for the Lanczos eigensolver. We identify a low cost invariant for efficient error checking, and through mathematical analysis determine the efficiency of our mechanism when used by the Lanczos eigensolver. We evaluate the proposed fault tolerant scheme using an open-source sparse eigensolver on a GPU platform, with and without the injection of faults. We use a large number of sparse matrices from real applications, to determine the efficiency and efficacy of our method and our implementation shows that the proposed fault tolerant method has good error coverage and low overhead. To the best of our knowledge, we are the first to introduce such a scheme for the Lanczos method.

查看译文

关键词

Fault tolerance, Invariant checking, Lanczos method, Sparse linear algebra, GPU

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要