Baechi: fast device placement of machine learning graphs

SoCC '20: ACM Symposium on Cloud Computing Virtual Event USA October, 2020(2020)

引用 12|浏览149
暂无评分
摘要
Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or the models are large. Splitting the model graph across multiple devices, today, largely relies on learning-based approaches to generate this placement. While it results in models that train fast on data (i.e., with low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, where we adopt an algorithmic approach to the placement problem for running machine learning training graphs on a small cluster of memory-constrained devices. We implemented Baechi so that it works modularly with TensorFlow. Our experimental results using GPUs show that Baechi generates placement plans in time 654X--206K X faster than today's learning-based approaches, and the placed model's step time is only up to 6.2% higher than expert-based placements.
更多
查看译文
关键词
fast device placement,graphs,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要