Multi-level Storage Optimization for Intermediate Data in AI Model Training

DATABASES THEORY AND APPLICATIONS, ADC 2023(2024)

引用 0|浏览1
暂无评分
摘要
As Transformer-based large models become the mainstream of AI training, the development of hardware devices (e.g., GPUs) cannot keep up with the rapid increase of model scale. Although the development of various parallel training techniques enables models to be trained on multiple GPUs, it still requires high costs that most researchers cannot afford. The increase of the hardware threshold for AI model training has affected the application of deep learning. In fact, CPU memory and external disk memory can be used as cache, which can reduce the occupation of high-cost GPU memory. In this paper, we analyze two types of intermediate data used in AI model training and propose a multi-level intermediate data offloading policy for the training process. Firstly, we propose a dynamic management policy via warm-up to optimize GPU memory usage according to the characteristics of the AI training process. Secondly, we asynchronously offload the optimizer state data with a specified ratio to the HDD, which can further optimize CPU memory usage. We conduct experiments on the large pre-trained model GPT-2 to verify the effectiveness of our method, and the results indicate that the multi-level storage optimization of intermediate data can help to achieve a larger AI model training under constrained hardware resources.
更多
查看译文
关键词
Heterogeneous training,Multi-level storage,Large models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要