CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
European Conference on Computer Systems(2023)
摘要
Deep Neural Networks (DNNs) have shown excellent performance in a wide range
of machine learning applications. Knowing the latency of running a DNN model or
tensor program on a specific device is useful in various tasks, such as DNN
graph- or tensor-level optimization and device selection. Considering the large
space of DNN models and devices that impede direct profiling of all
combinations, recent efforts focus on building a predictor to model the
performance of DNN models on different devices. However, none of the existing
attempts have achieved a cost model that can accurately predict the performance
of various tensor programs while supporting both training and inference
accelerators. We propose CDMPP, an efficient tensor program latency prediction
framework for both cross-model and cross-device prediction. We design an
informative but efficient representation of tensor programs, called compact
ASTs, and a pre-order-based positional encoding method, to capture the internal
structure of tensor programs. We develop a domain-adaption-inspired method to
learn domain-invariant representations and devise a KMeans-based sampling
algorithm, for the predictor to learn from different domains (i.e., different
DNN operators and devices). Our extensive experiments on a diverse range of DNN
models and devices demonstrate that CDMPP significantly outperforms
state-of-the-art baselines with 14.03% and 10.85% prediction error for
cross-model and cross-device prediction, respectively, and one order of
magnitude higher training efficiency. The implementation and the expanded
dataset are available at https://github.com/joapolarbear/cdmpp.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要