Time-series ML-regression on Graphcore IPU-M2000 and Nvidia A100

Jan Balewski, Zhenying Liu,Alexander Tsyplikhin, Manuel Lopez Roland,Kristofer Bouchard

2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)(2022)

引用 0|浏览7
暂无评分
摘要
We compare the ML-training performance of a Graphcore IPU-M2000-based system with Nvidia A100 GPU-based system on the Perlmutter HPC machine at NERSC/LBL. The multivariate regression of time series data from a simulated biological neuron was the scientific benchmark problem. The ML-model consisted of several convolutional, batch normalization, and fully connected layers. The training data were distributed in CPUs memory to eliminate the system dependent IO cost. The data-parallel training runs resulted in the same samples throughput on both GC200 IPUs and A100 GPUs for any choice of the number of accelerators between 1 and 256. The achieved best MSE validation loss on IPUs was only 10% to 20% larger. The aggregated energy use per 1 training epoch was between 2.5 to 3 times smaller for the Graphcore system in comparison to the Nvidia system. This paper also discusses aspects of software-hardware co-design to achieve highest efficiency on the IPU using PopTorch.
更多
查看译文
关键词
ML-regression,GPU A100,IPU M2000,power usage,weak scaling,strong scaling,IO optimization,PopTorch,disaggregation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要