High Performance Computing Queue Time Prediction Using Clustering and Regression

Scott Hutchison,Daniel Andresen,Mitchell Neilsen,William Hsu, Benjamin Parsons

Parallel Processing and Applied Mathematics(2023)

引用 0|浏览8
暂无评分
摘要
High Performance Computing (HPC) users are often provided little or no information at job submission time regarding how long their job will be queued until it begins execution. Foreknowledge of a long queue time can inform HPC user’s decision to migrate their jobs to commercial cloud infrastructure to receive their results sooner. Various researchers have used different machine learning techniques to build queue time estimators. This research applies the proven technique of K-Means clustering followed by Gradient Boosted Tree regression on over 700,000 jobs actually submitted to an HPC system to predict a submitted job’s queue time from HPC system characteristics and user provided job requirements. This method applied to HPC queue time prediction achieves better than 96% accuracy at classifying whether a job will start prior to an assigned deadline. Additionally, this research shows that historic HPC CPU allocation data can be used to predict future increases or decreases in job queue time with accuracy exceeding 96%.
更多
查看译文
关键词
prediction,clustering,high performance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要