Chrome Extension
WeChat Mini Program
Use on ChatGLM

Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication

The Journal of Supercomputing(2024)

Cited 0|Views7
No score
Abstract
The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded communication is the global task-based programming model. In the model, a program is divided into tasks, and tasks are asynchronously executed by each node, and independent thread-to-thread communications are expected. However, the Message passing interface (MPI) based approach is not efficient because of design issues. In this research, we design and implement the utofu transport layer in an abstracted communication library called Unified communication-X (UCX) for efficient remote direct memory access (RDMA) based multithreaded communication on Tofu Interconnect D. The evaluation results on Fugaku show that UCX can significantly improve the multithreaded performance over MPI, while maintaining portability between systems thanks to UCX. UCX shows about 32.8 times lower latency than Fujitsu MPI with 24 threads in the multithreaded pingpong benchmark and about 37.8 times higher update rate than Fujitsu MPI with 24 threads on 256 nodes in multithreaded GUPs benchmark.
More
Translated text
Key words
Supercomputer Fugaku,A64FX,Tofu Interconnect D,UCX,Multithreaded communication,One-sided communication
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined