Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache Accesses

Vanshika Baoni, Adarsh Mittal,Gurindar S. Sohi

MICRO(2021)

引用 4|浏览5
暂无评分
摘要
ABSTRACTThis paper considers locality among load instructions that are in processing contemporaneously within a processor to optimize the number of accesses to the memory hierarchy. A simple technique is used to learn and predict the number of contemporaneous accesses to a region of memory and classify a particular dynamic load into a normal or a fat load. Fat loads bring in additional data into Contemporaneous Load Access Registers (CLARs), from where other contemporaneous loads could be serviced without accessing the L1 cache. Experimental results indicate that with fat loads, along with 4 or 8 cache line size CLARs (256 or 512 bytes), the number of L1 cache accesses could be reduced by 50-60%, resulting in significant energy savings for the L1 cache operations. Further, in several cases the reduced latency for loads serviced from a CLAR results in an earlier resolution of some mispredicted branches, and a reduction in the number of wrong-path instructions, especially loads.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要