Towards Characterizing DNNs to Estimate Training Time using HARP (HPC Application Resource (runtime) Predictor.

Manikya Swathi Vallabhajosyula,Rajiv Ramnath

PEARC(2023)

引用 0|浏览3
暂无评分
摘要
Training DNN models for accuracy is resource intensive and needs high-performance computing resources. These resources come with a cost, and repeatedly training models with default allocations (complete node) for significant periods is expensive. Optimally allocating resources (roughly as needed by the job) allows the user to cut execution costs (sometimes even without compromising execution times). This also enables better utilization of the clusters by making them more available. Finetuning every job is exhaustive in terms of time to learn and understand the application and hardware characteristics. We built a framework called HARP that tries to learn from execution patterns (with some help from the user) and predict resource needs for the required configurations. This study explores the potential scalability of such models across different axis - input, hardware, and application hyperparameters. We also explore the transferability of such models within similar applications/ models (DNN-16 layers and VGG 16, or VGG16 and ResNet50).
更多
查看译文
关键词
estimate training time,runtime,dnns,hpc application resource,harp
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要