Optimize DGL Operations on x86-64 Multi-Core Processors.

International Conference on High Performance Compilation, Computing and Communications (HP3C)(2022)

引用 0|浏览0
暂无评分
摘要
Modern x86-64 processors have strong performance due to long vector units. Therefore long vector units are widely used in CNN-like neural network model inference on modern x86-64 processors. However the performance of GNN inference on modern x86-64 processors is poor. Unfortunately, with the development of GNNs and the increase of graph datasets, GNN inference performance meets the serious challenge on x86-64 processors. In this paper, we study the problem of poorly optimized DGL-based GAT models on the x86-64 platform, and analyze the main performance bottlenecks in this case. In order to optimize the performance of DGL on the two main x86-64 platform CPUs of Intel and AMD, we implement a simple and effective task allocator to balance the task load among multiple cores and use vector instructions to optimize the core operators in DGL. In addition, we also propose corresponding optimization ideas for the NUMA architecture. The experimental results show that our optimization method improves the performance of the basic DGL version by up to 3.12x and 2.6x on Intel and AMD platforms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要