Harmony: A Harness Monitoring System for the Oak Ridge Leadership Computing Facility

Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning)(2019)

引用 0|浏览1
暂无评分
摘要
Acceptance of a new system requires extensive testing and is often comprised of hundreds of tests. Summit, the latest flagship super-computer at the Oak Ridge Leadership Computing Facility (OLCF), and the number one system in the November 2018 Top500 list [2], completed its acceptance testing in 2018. To execute acceptance, the acceptance test (AT) team utilizes the OLCF test harness, a tool developed at the OLCF that automates the launch and verification of all acceptance tests. Acceptance requires analysis of test results and classification of all test failures. The sheer number of tests involved makes performing these tasks challenging. To complete these tasks more efficiently, in addition to lessen the personnel burden during acceptance testing, we developed a monitoring system for the OLCF test harness called Harmony.
更多
查看译文
关键词
high performance computing, large-scale system testing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要