Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
arxiv(2024)
摘要
Researchers and practitioners operating on a limited budget face the
cost-performance trade-off dilemma. The challenging decision often centers on
whether to use a large LLM with better performance or a smaller one with
reduced costs. This has motivated recent research in the optimisation of LLM
calls. Either a cascading strategy is used, where a smaller LLM or both are
called sequentially, or a routing strategy is used, where only one model is
ever called. Both scenarios are dependent on a decision criterion which is
typically implemented by an extra neural model. In this work, we propose a
simpler solution; we use only the uncertainty of the generations of the small
LLM as the decision criterion. We compare our approach with both cascading and
routing strategies using three different pairs of pre-trained small and large
LLMs, on nine different tasks and against approaches that require an additional
neural model. Our experiments reveal this simple solution optimally balances
cost and performance, outperforming existing methods on 25 out of 27
experimental setups.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要