eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
CoRR(2024)
Abstract
eXmY is a novel data type for quantization of ML models. It supports both
arbitrary bit widths and arbitrary integer and floating point formats. For
example, it seamlessly supports 3, 5, 6, 7, 9 bit formats. For a specific bit
width, say 7, it defines all possible formats e.g. e0m6, e1m5, e2m4, e3m3,
e4m2, e5m1 and e6m0. For non-power of two bit widths e.g. 5, 6, 7, we created a
novel encoding and decoding scheme which achieves perfect compression, byte
addressability and is amenable to sharding and vector processing. We
implemented libraries for emulation, encoding and decoding tensors and
checkpoints in C++, TensorFlow, JAX and PAX. For optimal performance, the
codecs use SIMD instructions on CPUs and vector instructions on TPUs and GPUs.
eXmY is also a technique and exploits the statistical distribution of exponents
in tensors. It can be used to quantize weights, static and dynamic activations,
gradients, master weights and optimizer state. It can reduce memory (CPU DRAM
and accelerator HBM), network and disk storage and transfers. It can increase
multi tenancy and accelerate compute. eXmY has been deployed in production for
almost 2 years.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined