KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache QuantizationColeman Hooper,Sehoon Kim,Hiva Mohammadzadeh,Michael W. Mahoney,Yakun Sophia Shao,Kurt Keutzer,Amir GholamiNeurIPS 2024(2024)引用 145|浏览103关键词Quantization,KV Cache,LLM Inference,Compression,Long Context LengthAI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要