A Comprehensive Survey of Compression Algorithms for Language Models
CoRR(2024)
摘要
How can we compress language models without sacrificing accuracy? The number
of compression algorithms for language models is rapidly growing to benefit
from remarkable advances of recent language models without side effects due to
the gigantic size of language models, such as increased carbon emissions and
expensive maintenance fees. While numerous compression algorithms have shown
remarkable progress in compressing language models, it ironically becomes
challenging to capture emerging trends and identify the fundamental concepts
underlying them due to the excessive number of algorithms. In this paper, we
survey and summarize diverse compression algorithms including pruning,
quantization, knowledge distillation, low-rank approximation, parameter
sharing, and efficient architecture design. We not only summarize the overall
trend of diverse compression algorithms but also select representative
algorithms and provide in-depth analyses of them. We discuss the value of each
category of compression algorithms, and the desired properties of low-cost
compression algorithms which have a significant impact due to the emergence of
large language models. Finally, we introduce promising future research topics
based on our survey results.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要