Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training
arxiv(2024)
Abstract
Large language models (LLMs) aligned through reinforcement learning from
human feedback (RLHF) have quickly become one of the dominant paradigms for
building intelligent conversational assistant agents. However, despite their
strong performance across many benchmarks, LLM-based agents still lack
conversational skills such as disambiguation: when generalized assistants are
faced with ambiguity, they often overhedge or implicitly guess users'
ground-truth intents rather than asking clarification questions, and under
task-specific settings, high-quality conversation samples are often limited,
affecting models' ability to learn optimal dialogue action policies. We propose
Action-Based Contrastive Self-Training (henceforth ACT), a quasi-online
preference optimization algorithm based on Direct Preference Optimization (DPO)
which allows for sample-efficient dialogue policy learning in multi-turn
conversation. We demonstrate ACT's efficacy under sample-efficient conditions
in three difficult conversational tasks: tabular-grounded question-answering,
machine reading comprehension, and AmbigSQL, a novel task for disambiguating
information-seeking requests for text-to-SQL generation. Additionally, we
propose evaluating LLMs' ability to function as conversational agents by
examining whether they can implicitly recognize and reason about ambiguity in
conversation. ACT demonstrates substantial conversation modeling improvements
over standard approaches to supervised fine-tuning and DPO.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined