Chrome Extension
WeChat Mini Program
Use on ChatGLM

Bridge-NDP: Achieving Efficient Communication-Computation Overlap in near Data Processing with Bridge Architecture

ASPDAC '24: Proceedings of the 29th Asia and South Pacific Design Automation Conference(2024)

Cited 0|Views15
No score
Abstract
Near data accelerators (NDAs) enable near data processing (NDP) within main memory that benefits performance by providing more aggregated bandwidth and reducing longdistance data transfer. Most prior works focus on reaping higher internal bandwidth to improve performance of the NDA itself. However, the overhead of interactive communication between host and NDAs is overlooked, which has become the bottleneck of NDP systems. In this paper, we propose bridge-NDP, a novel NDP architecture that exploits existing memory buses serving as bridge buses to fully utilize bandwidth. With bridge access enabled by optimized bridge commands, bridge-NDP efficiently overlaps communication and computation. It can be applied to existing NDP systems regardless of the memory level NDAs are attached to. For a variety of key computing kernels from machine learning, data analytics, etc., our evaluation shows that bridge-NDP speeds up not only the NDA performance itself (1.13×-3.62×), but also the host-NDA collaboration performance (2.43×-4.21×), achieving more bandwidth utilization (1.12×-3.67× and 1.48×-4.13×) over the state-of-the-art NDP solution.
More
Translated text
Key words
Bridge Architecture,Machine Learning,Data Transfer,Communication Overhead,Bandwidth Utilization,Time Constraints,Communication Time,Memory Devices,Computing Units,Channel Bandwidth,Total Execution Time,Data Bus
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined