AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications

Jawad Haj Yahya, Haris Volos, Davide B. Bartolini, Georgia Antoniou, Jeremie S. Kim, Zhe Wang, Kleovoulos Kalaitzidis, Tom Rollet, Zhirui Chen, Ye Geng, Onur Mutlu, Yiannakis Sazeides

2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)(2022)

Cited 1|Views60
No score
Abstract
User-facing applications running in modern datacenters exhibit irregular request patterns and are implemented using a multitude of services with tight latency requirements (30-250 mu s). These characteristics render existing energy-conserving techniques ineffective when processors are idle due to the long transition time (order of 100 mu s) from a deep CPU core idle power state (C-state). While prior works propose management techniques to mitigate this inefficiency, we tackle it at its root with AgileWatts (AW): a new deep CPU core C-state architecture optimized for datacenter server processors targeting latency-sensitive applications. AW drastically reduces the transition latency from deep CPU core idle power states while retaining most of their power savings based on three key ideas. First, AW eliminates the latency (several microseconds) of saving/restoring the core context when powering-off/-on the core in a deep idle state by i) implementing medium-grained power-gates, carefully distributed across the CPU core, and ii) retaining context in the power-ungated domain. Second, AW eliminates the flush latency (several tens of microseconds) of the L1/L2 caches when entering a deep idle state by keeping L1/L2 content power-ungated. A small control logic also remains ungated to serve cache coherence traffic. AW implements cache sleep-mode and leakage reduction for the power-ungated domain by lowering a core's voltage to the minimum operational level. Third, using a state-of-the-art power efficient all-digital phase-locked loop (ADPLL) clock generator, AW keeps the PLL active and locked during the idle state, cutting microseconds of wake-up latency at negligible power cost. Our evaluation with an accurate industrial-grade simulator calibrated against an Intel Skylake server shows that AW reduces the energy consumption of Memcached by up to 71% (35% on average) with <1% end-to-end performance degradation. We observe similar trends for other evaluated services (MySQL and Kafka). AW's new deep C-states C6A and C6AE reduce transition-time by up to 900 x as compared to the deepest existing idle state C6, while consuming only 7% and 5% of the active state (C0) power, respectively.
More
Translated text
Key words
Energy Efficiency,power management,Latency Sensitive applications
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined