Optimizing The Svd Bidiagonalization Process For A Batch Of Small Matrices

Tingxing Dong,Azzam Haidar,Stanimire Tomov,Jack Dongarra

INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017)（2017）

引用 5|浏览7

暂无评分

摘要

A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines, to solve them. We proposed device functions and big-tile settings in our batched BLAS design. We adopted auto-tuning to optimize different instances of GEMV routines. We illustrated our batched BLAS approach to optimize batched bi-diagonalization progressively on a K40c GPU. The optimization techniques in this paper are applicable to the other two-sided factorizations as well. (C) 2017 The Authors. Published by Elsevier B.V.

查看译文

关键词

Hardware accelerators,batched,two-sided factorization algorithms,Singular Value Problems

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要