Performance analysis and comparison of cellular automata GPU implementations

Cluster Computing(2017)

引用 11|浏览14
暂无评分
摘要
Cellular automata (CA) models are of interest to several scientific areas, and there is a growing interest in exploring large systems which would need high performance computing. In this work a CA implementation is presented which performs well in five different NVIDIA GPU architectures, from Tesla to Maxwell, simulating systems with up to a billion cells. Using the game of life (GoL) and a more complex variation of GoL as examples, a performance of 5.58e6 evaluated cells/s is achieved. The two optimizations most often used in previous studies are the use of shared memory and Multicell algorithms. Here, these optimizations do not improve performance in Fermi or newer architectures. The GoL CA code running in an NVIDIA Titan X obtained a speedup of up to ∼ 85 x and up to ∼ 230 x for a more complex CA, compared to an optimized serial CPU implementation. Finally, the efficiency of each GPU is analyzed in terms of cell performance/transistors and cell performance/bandwidth showing how the architectures improved for this particular problem.
更多
查看译文
关键词
Cellular automata, Graphics processing unit, Benchmarks, Performance analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要