On the Use of Vectorization in Production Engineering Workloads.

OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information)(2018)

引用 0|浏览8
暂无评分
摘要
Arguably, one of the greatest successes of early Cray supercomputers was the use of highly efficient vectorbased computation coupled to a balanced memory subsystem. In addition, efficient scalar throughput helped establish Seymour Cray’s designs as the benchmark by which other systems were measured. Recent high-performance processor designs have seen a resurgence in the use of vector-like hardware units. Some in the industry have also argued that use of accelerators such as GPUs will also lead to a greater focus on code being written to be amenable to vector-based computing. For the authors of this paper, a motivation to focus on efficient vectorization is to improve performance of the production ASC Trinity supercomputing platform which comprises approximately 9,000 nodes of dual-socket Haswell processors and 9,500 nodes of Intel Knights Landing sockets – both of which gain much of their computation prowess from the use of vector units in the floating point pipeline. In this paper, we describe a study of several modern, production engineering codes which are routinely used at Sandia National Laboratories and other important NNSA computing partner sites, evaluating the levels of utilization for vector units and the performance benefits obtained from vectorized computation. Our results show varying levels of benefit – vectorization is not always faster. Additionally, we show the ratio of vector to integer/logical instructions providing some insight as to why even highly vectorized code does not achieve high levels of performance improvement. Keywords-Vectorization, SIMD, HPC, Workload, Analysis
更多
查看译文
关键词
production engineering workloads,vectorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要