An Investigation of Prompt Variations for Zero-shot LLM-based Rankers
arxiv(2024)
摘要
We provide a systematic understanding of the impact of specific components
and wordings used in prompts on the effectiveness of rankers based on zero-shot
Large Language Models (LLMs). Several zero-shot ranking methods based on LLMs
have recently been proposed. Among many aspects, methods differ across (1) the
ranking algorithm they implement, e.g., pointwise vs. listwise, (2) the
backbone LLMs used, e.g., GPT3.5 vs. FLAN-T5, (3) the components and wording
used in prompts, e.g., the use or not of role-definition (role-playing) and the
actual words used to express this. It is currently unclear whether performance
differences are due to the underlying ranking algorithm, or because of spurious
factors such as better choice of words used in prompts. This confusion risks to
undermine future research. Through our large-scale experimentation and
analysis, we find that ranking algorithms do contribute to differences between
methods for zero-shot LLM ranking. However, so do the LLM backbones – but even
more importantly, the choice of prompt components and wordings affect the
ranking. In fact, in our experiments, we find that, at times, these latter
elements have more impact on the ranker's effectiveness than the actual ranking
algorithms, and that differences among ranking methods become more blurred when
prompt variations are considered.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要