PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
CVPR 2024(2024)
Abstract
Low-precision quantization is recognized for its efficacy in neural network
optimization. Our analysis reveals that non-quantized elementwise operations
which are prevalent in layers such as parameterized activation functions, batch
normalization, and quantization scaling dominate the inference cost of
low-precision models. These non-quantized elementwise operations are commonly
overlooked in SOTA efficiency metrics such as Arithmetic Computation Effort
(ACE). In this paper, we propose ACEv2 - an extended version of ACE which
offers a better alignment with the inference cost of quantized models and their
energy consumption on ML hardware. Moreover, we introduce PikeLPN, a model that
addresses these efficiency issues by applying quantization to both elementwise
operations and multiply-accumulate operations. In particular, we present a
novel quantization technique for batch normalization layers named QuantNorm
which allows for quantizing the batch normalization parameters without
compromising the model performance. Additionally, we propose applying Double
Quantization where the quantization scaling parameters are quantized.
Furthermore, we recognize and resolve the issue of distribution mismatch in
Separable Convolution layers by introducing Distribution-Heterogeneous
Quantization which enables quantizing them to low-precision. PikeLPN achieves
Pareto-optimality in efficiency-accuracy trade-off with up to 3X efficiency
improvement compared to SOTA low-precision models.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined