High-Performance PDES on Manycore Clusters

PADS(2021)

引用 4|浏览8
暂无评分
摘要
ABSTRACTPerformance and scalability of Parallel Discrete Event Simulation (PDES) is often limited by fine-grain communication, especially in execution environments with high communication cost. Low latencies of on-chip communication in emerging manycore processors promise to substantially alleviate conventional PDES bottlenecks. However, scaling to manycore clusters requires balancing faster on chip communication with slower traditional network communication between cluster nodes. In this work, we investigate performance of PDES on a cluster of Intel's Knights Landing (KNL) processors, identify performance bottlenecks, and propose techniques to address them. Specifically, we propose three performance optimizations: (1) a new design of the communication buffer centered around the use of atomic compare-and-swap operations to reduce synchronization overhead between a dedicated communication thread and computation threads; (2) careful selection of the number of computation threads per communication thread to limit the pressure on each communication thread; and (3) balancing the timing of communication and computation threads to ensure their synchronized forward progress. Combined, these optimizations result in a 2X - 16X speedup over baseline implementations in ROSS simulator.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要