Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference
arxiv(2024)
摘要
The customization of large language models (LLMs) for user-specified tasks
gets important. However, maintaining all the customized LLMs on cloud servers
incurs substantial memory and computational overheads, and uploading user data
can also lead to privacy concerns. On-device LLMs can offer a promising
solution by mitigating these issues. Yet, the performance of on-device LLMs is
inherently constrained by the limitations of small-scaled models. To overcome
these restrictions, we first propose Crayon, a novel approach for on-device LLM
customization. Crayon begins by constructing a pool of diverse base adapters,
and then we instantly blend them into a customized adapter without extra
training. In addition, we develop a device-server hybrid inference strategy,
which deftly allocates more demanding queries or non-customized tasks to a
larger, more capable LLM on a server. This ensures optimal performance without
sacrificing the benefits of on-device customization. We carefully craft a novel
benchmark from multiple question-answer datasets, and show the efficacy of our
method in the LLM customization.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要