Chrome Extension
WeChat Mini Program
Use on ChatGLM

Experience with Improving Distributed Shared Cache Performance on Tilera's Tile Processor

IEEE Computer Architecture Letters(2011)

Cited 17|Views2
No score
Abstract
This paper describes our experience with profiling and optimizing physical locality for the distributed shared cache (DSC) in Tilera's Tile multicore processor. Our approach uses the Tile Processor's hardware performance measurement counters (PMCs) to acquire page-level access pattern profiles. A key problem we address is imprecise PMC interrupts. Our profiling tools use binary analysis to correct for interrupt ``skid,'' thus pinpointing individual memory operations that incur remote DSC slice references and permitting us to sample their access patterns. We use our access pattern profiles to drive {\em page homing optimizations} for both heap and static data objects. Our experiments show we can improve physical locality for 5 out of 11 SPLASH2 benchmarks running on 32 cores, enabling 32.9\%--77.9\% of DSC references to target the local DSC slice. To our knowledge, this is the first work to demonstrate page homing optimizations on a real system.
More
Translated text
Key words
page-level access pattern profile,shared cache performance,access pattern profile,em page,access pattern,physical locality,tile multicore processor,remote dsc slice reference,dsc reference,tile processor,local dsc slice,hardware,benchmark testing,process design,multicore processing,registers,optimization
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined