Fault Tolerance through Invariant Checking for the Lanczos Eigensolver

Felix Loh,Kewal K. Saluja,Parameswaran Ramanathan

2020 33rd International Conference on VLSI Design and 2020 19th International Conference on Embedded Systems (VLSID)（2020）

引用 2|浏览15

暂无评分

摘要

The Lanczos eigensolver is a popular iterative method for approximating a few maximal eigenvalues of a real symmetric matrix, particularly if the matrix is large and sparse. In recent years, graphics processing units (GPUs) have become a popular platform for scientific computing applications, many of which are based on linear algebra, and are increasingly being used as the main computational units in supercomputers. This trend is expected to continue as the number of computations required by scientific applications reach petascale and exascale range. In this paper, we introduce an efficient error checking mechanism for the Lanczos eigensolver. To the best of our knowledge, we are the first to introduce such a scheme for the Lanczos method. We evaluate our fault tolerant scheme using an open-source sparse eigensolver on a GPU platform, with and without the injection of faults. We use sparse matrices from real applications, and show that our fault tolerant method has good error coverage and low overhead.

查看译文

关键词

Lanczos eigensolver,Lanczos method,fault tolerant method,invariant checking,iterative method,maximal eigenvalues,symmetric matrix,graphics processing units,scientific computing applications,scientific applications,efficient error checking mechanism

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要