Hardware-aware compression with Random Operation Access Specific Tile (ROAST) hashing
ICLR 2023(2023)
Abstract
Advancements in deep learning are often associated with increasing model sizes.
Training and deploying large models require sophisticated hardware and incur
significantly higher costs. Thus, model compression is a widely explored approach
to solving the problem. However, SOTA techniques fall short in one or more
desirable aspects of compression - for instance, pruning does not reduce memory
for training, quantization can only provide up to $32\times$ compression, HashedNet
is cache-inefficient, etc. This paper proposes a model-agnostic, cache-friendly,
and hardware-aware model compression approach: Random Operation Access
Specific Tile (ROAST) hashing. ROAST collapses the parameters by clubbing them
through a lightweight mapping. While clubbing these parameters, ROAST utilizes
cache hierarchies by aligning the memory access pattern with the parameter access
pattern. ROAST is up to $\sim 25 \times$ faster to train and $\sim 50 \times$ faster to infer than the
popular parameter sharing method HashedNet. Additionally, ROAST introduces
global weight sharing, which is empirically and theoretically superior to local
weight sharing in HashedNet, and can be of independent interest. With ROAST, we
can efficiently train and deploy the model using a much smaller memory footprint
($\sim 10 \times - 100 \times$ lesser) in text and image classification tasks
MoreTranslated text
Key words
model compression,hardware aware
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined