Scalable Distributed Last-Level TLBs Using Low-Latency Interconnects.

MICRO(2018)

引用 26|浏览58
暂无评分
摘要
Recent studies have shown the potential of last-level TLBs shared by multiple cores in tackling memory translation performance challenges posed by "big data" workloads. A key stumbling block hindering their effectiveness, however, is their high access time. We present a design methodology to reduce these high access times so as to realize high-performance and scalable shared L2 TLBs. As a first step, we study the benefits of replacing monolithic shared TLBs with a distributed set of small TLB slices. While this approach does reduce TLB lookup latency, it increases interconnect delays in accessing remote slices. Therefore, as a second step, we devise a lightweight single-cycle interconnect among the TLB slices by tailoring wires and switches to the unique communication characteristics of memory translation requests and responses. Our approach, which we dub Nocstar (NOCs for scalable TLB architecture), combines the high hit rates of shared TLBs with low access times of private L2 TLBs, enabling significant system performance benefits.
更多
查看译文
关键词
TLB, caches, network-on-chip, virtual memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要