A Unified CPU-GPU Protocol for GNN Training
arxiv(2024)
摘要
Training a Graph Neural Network (GNN) model on large-scale graphs involves a
high volume of data communication and compu- tations. While state-of-the-art
CPUs and GPUs feature high computing power, the Standard GNN training protocol
adopted in existing GNN frameworks cannot efficiently utilize the platform
resources. To this end, we propose a novel Unified CPU-GPU protocol that can
improve the resource utilization of GNN training on a CPU-GPU platform. The
Unified CPU-GPU protocol instantiates multiple GNN training processes in
parallel on both the CPU and the GPU. By allocating training processes on the
CPU to perform GNN training collaboratively with the GPU, the proposed protocol
improves the platform resource utilization and reduces the CPU-GPU data
transfer overhead. Since the performance of a CPU and a GPU varies, we develop
a novel load balancer that balances the workload dynamically between CPUs and
GPUs during runtime. We evaluate our protocol using two representative GNN
sampling algorithms, with two widely-used GNN models, on three datasets.
Compared with the standard training protocol adopted in the state-of-the-art
GNN frameworks, our protocol effectively improves resource utilization and
overall training time. On a platform where the GPU moderately outperforms the
CPU, our protocol speeds up GNN training by up to 1.41x. On a platform where
the GPU significantly outperforms the CPU, our protocol speeds up GNN training
by up to 1.26x. Our protocol is open-sourced and can be seamlessly integrated
into state-of-the-art GNN frameworks and accelerate GNN training. Our protocol
particularly benefits those with limited GPU access due to its high demand.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要