Early experiences with large-scale Cray XMT systems

Rome(2009)

引用 27|浏览0
暂无评分
摘要
Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512-processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these larger systems. We discuss how the programmer must work with the XMT compiler to extract maximum parallelism and performance, especially from multiply nested loops, and how the performance tools provide vital information about whether or how the compiler has parallelized loops and where performance bottlenecks may be occurring. We also show data that indicate that the maximum performance of a given application on a given size XMT system is limited by memory or network bandwidth, in a way that is somewhat independent of the number of processors used.
更多
查看译文
关键词
performance tool,early experience,size xmt system,performance bottleneck,maximum parallelism,tuning performance,large-scale cray xmt system,xmt compiler,64-processor xmt system,512-processor system,maximum performance,development lab,computer architecture,throughput,scaling,nested loops,prototypes,hardware,system testing,memory bandwidth,switches,bandwidth,irrigation,data mining,multithreading
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要