Enabling One-Size-Fits-All Compilation Optimization for Inference Across Machine Learning Computers

Yuanbo Wen,Qi Guo,Zidong Du,Jianxing Xu,Zhenxing Zhang,Xing Hu,Wei Li,Rui Zhang,Chao Wang,Zhou Xuehai,Tianshi Chen

IEEE Transactions on Computers（2022）

引用 1|浏览14

暂无评分

摘要

Machine Learning Computers (MLCs) with tensor functional units (e.g., NVIDIA’s Tensor Core, Google’s TPU and Habana’s Tensor Processor Core) have emerged significantly over recent years. The broad diversity of MLCs makes it hard to deploy machine learning workloads with optimized performance. Though deep learning compilers (e.g., TVM) are effective to produce optimized code for different hardware back-ends, when deploying to a new MLC, it is tedious to implement platform-specific compilation optimizations by thoroughly understanding system/architectural details. To address this problem, we propose a holistic approach to achieve one-size-fits-all compilation optimization for inference across different MLCs. The key observation is that diverse MLCs share multiple key architectural characteristics (e.g., tensor primitives and on-chip scratchpad memory) for tensor processing, which can be generalized for conducting cross-platform compilation optimizations. Concretely, we propose the Tensor Abstract Machine (TAM), which features such common architectural characteristics, as the abstraction of a broad range of MLCs. To leverage architectural characteristics of the TAM, we propose the Tensor Scheduling Language (TSL) consisting of tensor computation description and tensor scheduling primitives for implementing operations with portable optimization. By implementing tensor operations with TSL, the related optimized code for different MLCs can be automatically generated. To validate our proposal, we conduct experiments on 3 commodity MLCs including GPU with Tensor Cores, VTA (on FPGA), and Cloud TPU. Experimental results demonstrate that the code generated from the same optimization schedule achieves 1.05x to 2.05x better performance than hand-tuned libraries and deep learning compilers across different platforms.

查看译文

关键词

Machine learning computers,compilation optimization,tensor operations

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要