Divergent Token Metrics: Measuring degradation to prune away LLM components – and optimize quantization
arxiv(2023)
摘要
Large Language Models (LLMs) have reshaped natural language processing with
their impressive capabilities. However, their ever-increasing size has raised
concerns about their effective deployment and the need for LLM compression.
This study introduces the Divergent Token Metrics (DTMs), a novel approach to
assessing compressed LLMs, addressing the limitations of traditional perplexity
or accuracy measures that fail to accurately reflect text generation quality.
DTMs measure token divergences that allow deeper insights into the subtleties
of model compression, in particular, when evaluating components' impacts
individually. Utilizing the First Divergent Token Metric (FDTM) in model
sparsification reveals that 25
beyond 90
quantization, FDTM suggests that more than 80
transformed to int8 without special outlier management. These evaluations
indicate the necessity of choosing appropriate compressions for parameters
individually – and that FDTM can identify those – while standard metrics
result in deteriorated outcomes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要