LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
arxiv(2024)
摘要
Recent advancements in large language models (LLMs) are propelling us toward
artificial general intelligence with their remarkable emergent abilities and
reasoning capabilities. However, the substantial computational and memory
requirements limit the widespread adoption. Quantization, a key compression
technique, can effectively mitigate these demands by compressing and
accelerating LLMs, albeit with potential risks to accuracy. Numerous studies
have aimed to minimize the accuracy loss associated with quantization. However,
their quantization configurations vary from each other and cannot be fairly
compared. In this paper, we present LLMC, a plug-and-play compression toolkit,
to fairly and systematically explore the impact of quantization. LLMC
integrates dozens of algorithms, models, and hardwares, offering high
extensibility from integer to floating-point quantization, from LLM to
vision-language (VLM) model, from fixed-bit to mixed precision, and from
quantization to sparsification. Powered by this versatile toolkit, our
benchmark covers three key aspects: calibration data, algorithms (three
strategies), and data formats, providing novel insights and detailed analyses
for further research and practical guidance for users. Our toolkit is available
at \href{LLMC}{https://github.com/ModelTC/llmc}.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要