谷歌浏览器插件
订阅小程序
在清言上使用

GKT: A Novel Guidance-Based Knowledge Transfer Framework for Efficient Cloud-edge Collaboration LLM Deployment

Annual Meeting of the Association for Computational Linguistics(2024)

引用 0|浏览21
暂无评分
摘要
The burgeoning size of Large Language Models (LLMs) has led to enhancedcapabilities in generating responses, albeit at the expense of increasedinference times and elevated resource demands. Existing methods ofacceleration, predominantly hinged on knowledge distillation, generallynecessitate fine-tuning of considerably large models, such as Llama-7B, posinga challenge for average users. Furthermore, present techniques for expeditinginference and reducing costs operate independently. To address these issues, weintroduce a novel and intuitive Guidance-based Knowledge Transfer (GKT)framework. This approach leverages a larger LLM as a ”teacher” to createguidance prompts, paired with a smaller ”student” model to finalizeresponses. Remarkably, GKT requires no fine-tuning and doesn't necessitate theteacher and student models to have the same vocabulary, allowing for extensivebatch generation to accelerate the process while ensuring user customization.GKT can be seamlessly integrated into cloud-edge collaboration architectures,and is versatile enough for plug-and-play application across various models. Itexcels in both efficiency and affordability, epitomizing a ”cheap andcheerful” solution. GKT achieves a maximum accuracy improvement of 14.18along with a 10.72 times speed-up on GSM8K and an accuracy improvement of 14.00model and Llama2-70B as the student model, we can achieve 95.00performance at 52in accuracy and processing speed on the GSM8K and CSQA datasets, surpassing theperformance of using either the student or teacher models in isolation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要