Reward estimation for dialogue policy optimisation.

Computer Speech & Language(2018)

引用 23|浏览105
暂无评分
摘要
•Off-line neural network-based reward model.•On-line Gaussian process-based reward model.•Neural network-based dialogue embedding.•Human user evaluation.
更多
查看译文
关键词
Dialogue systems,Reinforcement learning,Deep learning,Reward estimation,Gaussian process,Active learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要