BIRD: A Lightweight and Adaptive Compressor for Communication-Efficient Distributed Learning Using Tensor-wise Bi-Random Sampling

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD(2023)

引用 0|浏览0
暂无评分
摘要
Top-K sparsification-based compression framework is widely employed to reduce communication costs in distributed learning. However, we have identified several issues with existing Top-K sparsification-based compression methods that severely impede their deployment in resource-constrained devices: (i) the limited compressibility of the Top-K parameter's indexes, which critically restricts the overall communication compression ratio; (ii) several time-consuming compression operations significantly negate the benefits of communication compression; (iii) the high memory footprint consumption associated with error feedback techniques used to maintain model quality. To address these issues, we propose a lightweight tensorwise Bi-Random sampling strategy with expectation invariance property called BIRD, which achieves higher compression ratios at lower computational overheads while maintaining a comparable model quality without additional memory costs. Specifically, BIRD applies a tensor-wise index sharing mechanism that substantially reduces the proportion of the index by allowing multiple tensor elements to share a single index, thus improving the overall compression ratio. Additionally, BIRD replaces the time-consuming Top-K sorting with a faster Bi-Random sampling strategy based on the aforementioned index sharing mechanism, thereby reducing the computational costs of compression; Moreover, BIRD establishes an expectation invariance property into the above Bi-Random sampling to ensure an unbiased representation for the L-1-norm of the sampled tensors, effectively maintaining the model quality without incurring extra memory costs. Experiments on multiple mainstream machine learning (ML) tasks demonstrate that compared to state-of-the-art methods, our proposed BIRD achieves 1.3x-31.1x higher compression ratio at lower time overheads with O(N) complexity while maintaining the model quality without incurring extra memory costs.
更多
查看译文
关键词
Distributed learning,Communication compression,Random Sampling,Neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要