PUMICE: Processing-using-Memory Integration with a Scalar Pipeline for Symbiotic Execution.

Socrates S. Wong, Cecilio C. Tamarit,José F. Martínez

DAC(2023)

引用 0|浏览4
暂无评分
摘要
Existing SIMD extensions in scalar CPUs (e.g., SSE, AVX, etc.) can leverage instruction-level parallelism (ILP) because of their tight integration with the CPU pipeline. However, the vectors they employ are quite short, and this limits their ability to exploit data-level parallelism (DLP). On the other hand, processing-using-memory (PUM) accelerators are capable of exploiting massive amounts of DLP, as they typically perform computation on very long vectors (tens of thousands of elements) within the memory itself. Recent work demonstrates that orderof-magnitude speedups can be achieved by these architectures for a variety of workloads over area-equivalent multicore CPUs with SIMD extensions. Still, PUM architectures are largely decoupled from the CPU itself, thereby limiting their ability to tap the CPU's ILP the way SIMD extensions do. In this paper, we propose PUMICE, a tightly integrated CPU-PUM architecture that simultaneously exploits DLP and ILP for very long vector operations. As a result of this tight integration, PUMICE delivers significant performance gains: Our experimental results show speedups of up to 2.2x (1.4x on average) over a state-of-the-art decoupled approach.
更多
查看译文
关键词
Associative processing,associative memory,vector processors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要