pommDNN: Performance optimal GPU memory management for deep neural network training

Weiduo Chen,Xiaoshe Dong, Xinhang Chen,Song Liu,Qin Xia,Qiang Wang

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE（2024）

引用 0|浏览17

暂无评分

摘要

It is known that deep neural network (DNN) models achieve higher accuracy by deeper structure, but the deep structure and the limitation of GPU memory restrict the model training to a very small batch size, underutilizing the GPU's computational power. The optimization of GPU memory management based on tensor swapping between GPU and CPU memory can reduce the memory overhead of DNN models, which enables the GPU to train models with a larger batch size. However, because of inappropriate tensor swapping schemes, existing works lead to a slowdown in model training when batch size is expanded.In this paper, we propose pommDNN1, a performance optimal GPU memory management method that improves the training speed by batch size expansion, and the optimal batch size is selected by predicting the throughput of DNN model training with the expanded batch size. The method of pommDNN makes a tradeoff between the performance gained by batch size expansion and the communication overhead caused by tensor swapping to select the optimal batch size. We design an optimal tensor swapping scheme searching algorithm based on genetic algorithm according to the DNN computational graph and the result of optimal batch size selection. Our experiments show that for DNN models with different depths, pommDNN can improve the throughput of network training by 1 similar to 57%, which is higher than other tensor swapping based methods on most models.

查看译文

关键词

Model training,GPU Memory Management,Tensor swapping,Optimal performance,Training throughput improvement,Batch size selection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要