A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton

SC(2009)

引用 54|浏览120
暂无评分
摘要
Anton, a massively parallel special-purpose machine for molecular dynamics simulations, performs a 32x32x32 FFT in 3.7 microseconds and a 64x64x64 FFT in 13.3 microseconds on a configuration with 512 nodes---an order of magnitude faster than all other FFT implementations of which we are aware. Achieving this FFT performance requires a coordinated combination of computation and communication techniques that leverage Anton's underlying hardware mechanisms. Most significantly, Anton's communication subsystem provides over 300 gigabits per second of bandwidth per node, message latency in the hundreds of nanoseconds, and support for word-level writes and single-ended communication. In addition, Anton's general-purpose computation system incorporates primitives that support the efficient parallelization of small 1D FFTs. Although Anton was designed specifically for molecular dynamics simulations, a number of the hardware primitives and software implementation techniques described in this paper may also be applicable to the acceleration of FFTs on general-purpose high-performance machines.
更多
查看译文
关键词
single-ended communication,communication technique,leverage anton,communication subsystem,fft performance,molecular dynamics simulation,hardware primitive,general-purpose high-performance machine,fft implementation,general-purpose computation system,high performance computing,parallel computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要