Near-Stream Computing: General and Transparent Near-Cache Acceleration

2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)(2022)

Cited 13|Views30
No score
Abstract
Data movement and communication have become the primary bottlenecks in large multicore systems. The near-data computing paradigm provides a solution: move computation to where the data resides on-chip. Two challenges keep near-data computing from the mainstream: lack of programmer transparency and applicability. Programmer transparency requires providing sequential memory semantics with distributed computation, which requires burdensome coordination. Broad applicability requires support for combinations of address patterns (e.g. affine, indirect, multi-operand) and computation types (loads, stores, reductions, atomics).We find that streams – coarse grain memory access patterns – are a powerful ISA abstraction for near data offloading. Tracking data access at stream-granularity heavily reduces the burden of coordination for providing sequential semantics. Decomposing the problem using streams means that arbitrary combinations of address and computation patterns can be combined for broad generality.With this insight, we develop a paradigm called near-stream computing, comprising a compiler, CPU ISA extension, and a microarchitecture that facilitate programmer transparent computation offloading to shared caches. We evaluate our system on OpenMP kernels that stress broad addressing and compute behavior, and find that 46% of dynamic instructions can be offloaded to remote banks, reducing the network traffic by 76%. Overall it achieves 2.13× speedup over a state-of-the-art near-data computing technique, with a 1.90× energy efficiency gain.
More
Translated text
Key words
Stream-Based ISAs,Programmer-Transparent Acceleration,Near-Data Computing
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined