AutoConstruct: Automated Neural Surrogate Model Building and Deployment for HPC Applications

PROCEEDINGS OF THE 13TH WORKSHOP ON AI AND SCIENTIFIC COMPUTING AT SCALE USING FLEXIBLE COMPUTING INFRASTRUCTURES, FLEXSCIENCE 2023(2023)

引用 0|浏览0
暂无评分
摘要
Scientific Machine Learning (SciML), aiming at using machine learning methods to solve scientific computing problems, has been used in a wide range of HPC applications to improve the applications' performance. However, domain scientists, despite their rich expertise in their respective fields, often lack adequate knowledge of machine learning and computer systems. This gap makes it difficult for them to determine which computations can be replaced by neural network (NN) surrogates. Moreover, NN surrogates in HPC applications typically have sparse input features, necessitating system-level optimization to prevent memory resource overuse on accelerators. Consequently, constructing and deploying NN surrogates for HPC applications remains a significant challenge, with no existing tools to assist domain scientists in systematically building high-quality ML surrogates. To address these issues, we introduce AutoConstruct, a comprehensive solution for SciML in distributed systems. AutoConstruct consists of two components, the model-maker and deployment-helper, which operate automatically to select the appropriate ML surrogate architecture and deploy the surrogate model training and inference on specific accelerators, respectively. We showcase AutoConstruct's adaptability and performance through a series of HPC numerical benchmarks, real-world scientific applications, and computing platforms.
更多
查看译文
关键词
Scientific Machine Learning,Surrogate Model Construction,Bayesian Optimization,Distributed ML training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要