User Feedback-based Online Learning for Intent Classification

Kaan Gonc,Baturay Saglam,Onat Dalmaz,Tolga Cukur,Suleyman S. Kozat,Hamdi Dibeklioglu

PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023（2023）

引用 0|浏览5

暂无评分

摘要

Intent classification is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user's query. Most existing intent classification methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overfit or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: offline pretraining and online fine-tuning. In the offline stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fine-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efficiently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method significantly outperforms policies that omit either offline pretraining or online fine-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.

查看译文

关键词

Online Learning,Contextual Bandits,Intent Classification,Multimodal Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要