Accelerating AI Applications with Sparse Matrix Compression in Halide

Journal of Signal Processing Systems(2022)

引用 1|浏览4
暂无评分
摘要
Machine learning profoundly impacts every aspect of our lives. As machine learning evolves, many techniques, such as deep learning, are improving its accuracy and performance. Nonetheless, large data computations with large memory footprints will always be a bottleneck for deep learning applications. One of the most computationally demanding DNN operations is matrix multiplication, such as the convolution layer and fully connected layer, which preserve the image arrangement and obtain a partial image as an input feature. Our goal is to find an effective method for programmers to improve the performance of such matrix multiplication layers. Halide is an image processing programming language that separates the algorithm from its schedule. With the use of Halide, one can easily enhance the performance of their code with built-in scheduling primitives. In this paper, we propose sparse matrix compression schedule primitives with different compression schemes in Halide and find a method to improve convolution with the im2col method. With this design, we can compress the matrix to enhance the performance of convolution. We can also optimize natural language processing (NLP) with proposed compression scheduling. The word embedding training model can convert words into multidimensional vectors and transform words that do not have meaning into vectors with meaning. We focus on the word representation application in FastText, in which general matrix-vector multiplication (GEMV) is one of the most computationally intensive operations. We refine the software architecture of FastText and preprocess the pretrained model ahead of time. Our experiments show that the convolution and GEMV performance can be enhanced by the proposed design.
更多
查看译文
关键词
SPMM, Lossless compression, AI, Scheduler
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要