Coherency Traffic Reduction in Manycore Systems

Erdem Derebasoglu,Ismail Kadayif,Ozcan Ozturk

2022 25th Euromicro Conference on Digital System Design (DSD)(2022)

引用 0|浏览6
暂无评分
摘要
With the increasing number of cores in manycore accelerators and chip multiprocessors (CMPs), it gets more challenging to provide cache coherency efficiently. Although the snooping-based protocols are appropriate solutions to small-scale systems, they are inefficient for large systems because of the limited bandwidth. Therefore, large-scale manycores require directory-based solutions where a hardware structure called directory holds the information. This directory keeps track of all memory blocks and which cache stores a copy of these blocks. The directory sends messages only to caches that store relevant blocks and also coordinate simultaneous accesses to a cache block. As directory-based protocols scale to many cores, performance, network-on-chip (NoC) traffic, and bandwidth become major problems. In this paper, we present software mechanisms to improve the effectiveness of directory-based cache coherency in manycore and multicore systems with shared memory. In multithreaded applications, some of the data accesses do not disrupt cache coherency, but they still produce coherency messages among cores such as read-only (private) data. However, if data is accessed by at least two cores and at least one of them is a write operation, it is called shared data and requires cache coherency. In our proposed system, private data and shared data are determined at compile time, and cache coherency protocol only applies to shared data. We implement our approach in two stages. First, we use Andersen's static pointer analysis to analyze the program and mark its private instructions, i.e., instructions that load or store private data. Then, we use these analyses to decide if cache coherency protocol will be applied or not at runtime. Our simulation results on parallel benchmarks show that our approach reduces cycle count, dynamic random access memory (DRAM) accesses, and coherency traffic up to 13%.
更多
查看译文
关键词
CMPs, Coherency, Pointer analysis, Directory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要