Multiplication Through a Single Look-Up-Table (LUT) in CNN Inference Computation

Shiyu Xu,Qi Wang, Xingbo Wang,Shihang Wang,Terry Tao Ye

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2022)

引用 3|浏览19
暂无评分
摘要
Parameter quantization with lower bit width is the common approach to reduce the computation loads in CNN inference. With the parameters being replaced by fixed-width binaries, multiplication operations can be replaced by the look-up-table (LUT), where the multiplier-multiplicand operands serve as the table index, and the precalculated products serve as table elements. Because the histogram profiles of the parameters in different layers/channels differ significantly in CNN, previous LUT-based computation methods have to use different LUTs for each layer/channel, and consequently demand larger memory space along with extra access time and power consumption. In this work, we first normalize the parameters Gaussian profiles of different layers/channels to have similar means and variances, and further quantize the normalized parameters into fixed width through nonlinear quantization. Because of the normalized parameters profile, we can use one single compact LUT ( $16\times 16$ entries) to replace all multiplication operations in the whole network. Furthermore, the normalization procedure also reduces the errors induced from quantization. Experiments demonstrate that with a compact 256-entry LUT, we can achieve the accuracy comparable to the results from 32-bit floating-point calculation; while significantly reducing the computation loads and memory spaces, along with power consumption and hardware resources.
更多
查看译文
关键词
Convolutional neural network (CNN) acceleration,hardware/software co-design,look-up-table (LUT),low-bit quantization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要