An Architecture for Heterogeneous High-Performance Computing Systems: Motivation and Requirements.

Christoph Hagleitner, Florian Auernhammer, James C. Sexton, Constantinos Evangelinos,Guerney Hunt,Christian Pinto,Michael Johnston, Charles R. Johns, Jim Kahle

2023 IEEE John Vincent Atanasoff International Symposium on Modern Computing (JVA)(2023)

引用 0|浏览4
暂无评分
摘要
Today's rapid progress in AI and science is largely fueled by the availability of ever larger and more powerful compute systems. The “classic” HPC systems targeted at executing complex workflows and simulations have recently crossed the exaflop boundary in terms of their double-precision floating point performance. At the same time, new systems targeted at training large AI-models use alternative number representations and are already pushing the limits well beyond the ten exaflop mark. To continue scaling the performance of large HPC systems, system architects need to address several barriers including the slowdown of Moore's law, energy density limitations, production yield challenges and practical limits to overall power consumption in the 10s-of- MW range. All recent #1 HPC systems are already relying on specialized, heterogenous components to offset the slowdown. As specialization continues and advances, the heterogeneity will evolve from today's CPU-GPU combinations into a broad set of more specialized accelerators, but also entirely new computing paradigms, e.g., Quantum computing are emerging. With the continued scaling of total system size and the compute density within a single node, the intra- and inter-node communication requirements increase accordingly. Today, most available interconnect fabrics that support symmetric multiprocessing (SMP) and/or asymmetric variants of cache-coherent communication are based on proprietary implementations, which prevent the assembly of heterogeneous high-performance systems from components from more than a single vendor. Hence, system architects are looking at ways assemble innovative high-performance heterogenous systems using open standards. Under continued cost constraints, better utilization is desired to match hardware configuration to software usage needs. The ability to compose virtual compute nodes from a set of disaggregated components is a natural way of approaching the problem. The first challenge of composability that is currently being tackled is memory disaggregation. A vision of higher utilization and resource sharing is appealing, but low latency and high bandwidth need to be maintained. All these trends and limitations demand a fresh look at the architectures enabling an ecosystem from which domain-specific high-performance computing systems can be assembled. In this presentation, we discuss the motivation and requirements for a new node level and rack scale architecture as well as the need for an open standards-based, composable, high-performance interconnect fabric. The architecture of this system needs to be accompanied by an open and interoperable software stack as well as a fine-grained control plane. The control plane enables and supports composability under tight security and performance constraints. While composability originated to increase the efficiency of heterogenous computer systems, more recently, it has been proposed as a means for heterogeneous components to share a common memory pool, reduce data traffic, and increase the speed of cooperation among the system components. At the lowest level, it is critical for CPU-and accelerator cores to have their own memory hierarchies (L1, L2, LLC). Shared memory pools could be very effective mechanism for coordinating a workflow across the heterogenous system components. Workflows can then evolve from a file-based sharing method to a shared memory model utilizing a high-speed fabric, e.g., CXL. Composability also addresses the sustainability issues of large computing systems by allowing for upgrades of individual parts in the system. The use of heterogenous components must expand from the current rack-level or board-level integration down to chiplet-based modules, and even System-on-Chip (SoC), depending on the scale and demands of a workflow. A standards-based coherent interconnect fabric is a key element that will allow innovations from different heterogeneous components to be mixed beyond the limitations of any single vendor and is also a key ingredient for an industry growth play. For board-level connections outside of the SMP fabric, the evolving CXL standards are a good match for this role as they support traditional I/O connect plus a scalable memory extension. CXL over the emerging UCIe connection standard offers the possibility to extend this value proposition to a chiplet-based ecosystem, where tight integration into the SMP fabric is not required. For extended reach, CXL over an optics standard would provide for even larger scale composable systems. The composable elements in a compute fabric need a distributed control structure for initialization, resource management, and workflow control. Open standards such as OFMF will play a critical role in the overall system management. The emergence of confidential computing as a paradigm for reducing the trusted computing base (TCB) of a computation is also essential for HPC and cloud. Open standards are required to enable confidential computing's trusted execution environments (TEEs) across heterogeneous elements. Security features will be required to address supply chain attacks, secure and trusted boot, authentication and attestation of each component, enable secure and confidential communication between the various components in the heterogenous system.
更多
查看译文
关键词
heterogeneous computing systems,CXL,interconnect fabric
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要