NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference
CoRR(2023)
摘要
The inherent diversity of computation types within individual Deep Neural
Network (DNN) models imposes a corresponding need for a varied set of
computation units within hardware processors. This diversity poses a
significant constraint on computation efficiency during the execution of
different neural networks. In this study, we present NeuralMatrix, a framework
that transforms the computation of entire DNNs into linear matrix operations.
This transformation seamlessly enables the execution of various DNN models
using a single General-Purpose Matrix Multiplication (GEMM) accelerator.
Extensive experimental results spanning different DNN models demonstrate that
our approach preserves network accuracy while providing both generality and
application-specific levels of computation efficiency. This allows a broad
spectrum of DNN models to be executed using a single GEMM accelerator,
eliminating the need for additional special function units.
更多查看译文
关键词
entire neuralmatrix networks,efficient inference,neuralmatrix networks,multiplication
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要