Training of Deep Learning Pipelines on Memory-Constrained GPUs via Segmented Fused-Tiled Execution

Yufan Xu,Saurabh Raje,Atanas Rountev,Gerald Sabin,Aravind Sukumaran-Rajam,P. Sadayappan

CC'22: PROCEEDINGS OF THE 31ST ACM SIGPLAN INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION（2022）

引用 0|浏览23

暂无评分

摘要

Training models with massive inputs is a significant challenge in the development of Deep Learning pipelines to process very large digital image datasets as required by Whole Slide Imaging (WSI) in computational pathology and analysis of brain fMRI images in computational neuroscience. Graphics Processing Units (GPUs) represent the primary workhorse in training and inference of Deep Learning models. In order to use GPUs to run inference or training on a neural network pipeline, state-of-the-art machine learning frameworks like PyTorch and TensorFlow currently require that the collective memory on the CPUs must be larger than the size of the activations at any stage in the pipeline. Therefore, existing Deep Learning pipelines for these use cases have been forced to develop sub-optimal "patch-based" modeling approaches, where images are processed in small segments of an image. In this paper, we present a solution to this problem by employing tiling in conjunction with check-pointing, thereby enabling arbitrarily large images to be directly processed, irrespective of the size of global memory on a GPU and the number of available GPUs. Experimental results using PyTorch demonstrate enhanced functionality/performance over existing frameworks.

查看译文

关键词

DNN,GPU,Large image training,Fusion,Tiling,Memory-constrained execution,Checkpointing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要