Bridge-NDP: Achieving Efficient Communication-Computation Overlap in Near Data Processing with Bridge Architecture

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)(2024)

引用 0|浏览0
暂无评分
摘要
Near data accelerators (NDAs) enable near data processing (NDP) within main memory that benefits performance by providing more aggregated bandwidth and reducing long-distance data transfer. Most prior works focus on reaping higher internal bandwidth to improve performance of the NDA itself. However, the overhead of interactive communication between host and NDAs is overlooked, which has become the bottleneck of NDP systems. In this paper, we propose bridge-NDP, a novel NDP architecture that exploits existing memory buses serving as bridge buses to fully utilize bandwidth. With bridge access enabled by optimized bridge commands, bridge-NDP efficiently overlaps communication and computation. It can be applied to existing NDP systems regardless of the memory level NDAs are attached to. For a variety of key computing kernels from machine learning, data analytics, etc., our evaluation shows that bridge-NDP speeds up not only the NDA performance itself (1.13×-3.62×), but also the host-NDA collaboration performance (2.43×-4.21×), achieving more bandwidth utilization (1.12×-3.67× and 1.48×-4.13×) over the state-of-the-art NDP solution.
更多
查看译文
关键词
Bridge Architecture,Machine Learning,Data Transfer,Communication Overhead,Bandwidth Utilization,Time Constraints,Communication Time,Memory Devices,Computing Units,Channel Bandwidth,Total Execution Time,Data Bus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要