Generating high performance pruned FFT implementations

Taipei(2009)

引用 30|浏览0
暂无评分
摘要
We derive a recursive general-radix pruned Cooley-Tukey fast Fourier transform (FFT) algorithm in Kronecker product notation. The algorithm is compatible with vectorization and parallelization required on state-of-the-art multicore CPUs. We include the pruned FFT algorithm into the program generation system Spiral, and automatically generate optimized implementations of the pruned FFT for the Intel Core2Duo multicore processor. Experimental results show that using the pruned FFT can indeed speed up the fastest available FFT implementations by up to 30% when the problem size and the pattern of unused inputs and outputs are known in advance.
更多
查看译文
关键词
state-of-the-art multicore cpus,optimized implementation,problem size,generating high performance,kronecker product notation,fft algorithm,recursive general-radix,program generation system spiral,fastest available fft implementation,intel core2duo multicore processor,kronecker product,fast fourier transform,discrete fourier transform,software performance,vector processing,pervasive computing,multiprocessing,fast fourier transforms,multicore processors,data mining,tensile stress,spirals,application software,multicore processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要