FFT blitz: the tensor cores strike back

PPOPP(2021)

引用 8|浏览23
暂无评分
摘要
ABSTRACTThe fast Fourier Transform (FFT), a reduced-complexity formulation of the Discrete Fourier Transform (DFT), is an important tool in many areas of science and engineering. FFTW is a well-known package that follows this approach and is currently one of the fastest available implementations of the FFT. NVIDIA introduced its version of FFTW called cuFFT that achieves high performance on the GPUs. In this work we present a novel way to map the FFT algorithm on the newly introduced Tensor Cores by adapting the the Cooley-Tukey recursive FFT algorithm. We present four major types of optimizations that enhance the performance of our approach for varying FFT sizes and show that the approach consistently outperforms cuFFT with a speedup of about 15% to 250% on average.
更多
查看译文
关键词
GPU Computing,Performance Optimization,High-Performance Computing,Nonnegative Tensor Factorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要