Squashed Weight Distribution for Low Bit Quantization of Deep Models

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

Cited 4|Views29
No score
Abstract
Inference with large deep learning models in resource-constrained settings is increasingly a bottleneck in real-world applications of state-of-the-art AI. Here we address this by low-precision weight quantization. We achieve very low accuracy degradation by reparameterizing the weights in a way that leaves the weight distribution approximately uniform. We show lower bit-width quantization and less accuracy degradation than previously reported in experiments on GLUE benchmarks (3-bit, 0.2% rel. degradation), and on internal intent/slot-filling datasets (2-bit, 0.4% rel. degradation).
More
Translated text
Key words
low bit quantization,deep models,squashed weight distribution
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined