Reward estimation for dialogue policy optimisation.
Computer Speech & Language(2018)
摘要
•Off-line neural network-based reward model.•On-line Gaussian process-based reward model.•Neural network-based dialogue embedding.•Human user evaluation.
更多查看译文
关键词
Dialogue systems,Reinforcement learning,Deep learning,Reward estimation,Gaussian process,Active learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要