Reducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors

IPDPS(2000)

引用 9|浏览0
暂无评分
摘要
Parallel programs that modify shared data in a cache-coherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisitions at writes to shared data. This can have a significant impact on performance in a cache-coherent non-uniform memory architecture (NUMA) multiprocessor. By combining a read-request and an ownership acquisition, the write latency and network traffic can potentially be reduced.In this paper, we propose a new hardware-based approach for performing this optimization by targeting {load-store} sequences, which we show is a super-set of migratory sharing. A load-store sequence consists of a global read request followed by a global write action to the same memory location from the same processor, without any intervening access to the same block from any other processor.We use detailed simulation with four benchmark programs including one on-line transaction processing workload and operating system execution to examine the effectiveness of the proposed technique. The results show that the technique is able to reduce write-related latency and network traffic more than previous hardware-based techniques, up to twice as much.
更多
查看译文
关键词
ownership overhead,load-store sequence,cache-coherent non-uniform memory architecture,new hardware-based approach,load-store sequences,network traffic,global read request,cache-coherent multiprocessor,cache-coherent multiprocessors,previous hardware-based technique,memory location,ownership acquisition,operating system,operating systems,cache coherence,protocols,transaction processing,read only memory,hardware
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要