A Comprehensive Analytical Performance Model of DRAM Caches.

ICPE(2015)

引用 5|浏览11
暂无评分
摘要
ABSTRACTStacked DRAM promises to offer unprecedented capacity, and bandwidth to multi-core processors at moderately lower latency than off-chip DRAMs. A typical use of this abundant DRAM is as a large last level cache. Prior research works are divided on how to organize this cache and the proposed organizations fall into one of two categories: (i) as a Tags-In-DRAM organization with the cache organized as small blocks (typically 64B) and metadata (tags, valid, dirty, recency and coherence bits) stored in DRAM, and (ii) as a Tags-In-SRAM organization with the cache organized as larger blocks (typiclly 512B or larger) and metadata stored on SRAM. Tags-In-DRAM organizations tend to incur higher latency but conserve off-chip bandwidth while the Tags-In-SRAM organizations incur lower latency at some additional bandwidth. In this work, we develop a unified performance model of the DRAM-Cache that models these different organizational styles. The model is validated against detailed architecture simulations and shown to have latency estimation errors of 10:7% and 8:8% on average in 4-core and 8-core processors respectively. We also explore two insights from the model: (i) the need for achieving very high hit rates in the meta- data cache/predictor (commonly employed in the Tags-In-DRAM designs) in reducing latency, and (ii) opportunities for reducing latency by load-balancing the DRAM Cache and main memory.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要