Scalable Distributed Last-Level TLBs Using Low-Latency Interconnects.

Srikant Bharadwaj,Guilherme Cox,Tushar Krishna,Abhishek Bhattacharjee

MICRO（2018）

引用 26|浏览58

暂无评分

摘要

Recent studies have shown the potential of last-level TLBs shared by multiple cores in tackling memory translation performance challenges posed by "big data" workloads. A key stumbling block hindering their effectiveness, however, is their high access time. We present a design methodology to reduce these high access times so as to realize high-performance and scalable shared L2 TLBs. As a first step, we study the benefits of replacing monolithic shared TLBs with a distributed set of small TLB slices. While this approach does reduce TLB lookup latency, it increases interconnect delays in accessing remote slices. Therefore, as a second step, we devise a lightweight single-cycle interconnect among the TLB slices by tailoring wires and switches to the unique communication characteristics of memory translation requests and responses. Our approach, which we dub Nocstar (NOCs for scalable TLB architecture), combines the high hit rates of shared TLBs with low access times of private L2 TLBs, enabling significant system performance benefits.

查看译文

关键词

TLB, caches, network-on-chip, virtual memory

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要