Design-effort alloy: Boosting a highly tuned primary core with untuned alternate cores

Computer Design(2014)

Cited 1|Views17
No score
Abstract
A commercial flagship superscalar core is a highly tuned machine. Designers spend significant effort to tune the register-transfer-level (RTL) model, circuits, and layout to optimize performance and power. Nonetheless, the one-size-fits-all microarchitecture still suffers from suboptimal performance and power on individual applications. A single-ISA heterogeneous multi-core, with its multiple diverse core designs, has potential to exploit application diversity. However, tuning multiple core types will incur insurmountable design effort. This paper proposes a new class of single-ISA heterogeneous multi-core processor, called design-effort alloy (DEA). Only one of the core types, called the high-effort core (HEC), is tuned using a high-effort design flow. Much less effort is spent on tuning other core types, called low-effort cores (LECs). We begin with synthesizable RTL designs of a palette of out-of-order superscalar core types. A LEC and HEC is designed for each core type: the LEC is based on design automation and the HEC is derived from its LEC counterpart, using frequency and energy scaling factors that account for RTL, circuit, and layout optimizations. The resulting HECs have more than a 2x frequency advantage with only a 1.3× increase in energy consumption compared to their corresponding LECs. From the palette of core types, we find the best 4-core-type DEA processor for 179 SPEC SimPoints (program phases). Our study yielded the following key results: 1) The DEA processor's HEC is the same core type in the best high-effort homogeneous multi-core, owing to most program phases demonstrating “average” instruction-level behavior and favoring this balanced core. 2) The DEA processor yields a speedup in BIPS3/W of 1%-87%, and a geometric-mean speedup of 25%, on 20 out of 179 SimPoints over the best high-effort homogeneous multi-core. Thus, untuned LECs operating at less than half the frequency of the HEC nonetheless accelerate program phase- with “outlier” instruction-level behavior.
More
Translated text
Key words
electronic design automation,integrated circuit layout,integrated circuit modelling,microprocessor chips,multiprocessing systems,DEA processor,HEC,LEC,application diversity,design automation,design-effort alloy,high-effort design flow,high-effort homogeneous multicore,layout optimization,low-effort cores,primary core,register-transfer-level model,single-ISA heterogeneous multicore,superscalar core,
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined