Constraint-aware Policy Optimization to Solve the Vehicle Routing Problem with Time Windows

Renchi Zhang,Runsheng Yu,Wei Xia

INFORMATION TECHNOLOGY AND CONTROL(2022)

引用 0|浏览7
暂无评分
摘要
The vehicle routing problem with time windows (VRPTW) as one of the most known combinatorial operations (CO) problem is considered to be a tough issue in practice and the main challenge of that is to find the approximate solutions within a reasonable time. In recent years, reinforcement learning (RL) based methods have gained increasing attention in many CO problems, such as the vehicle routing problems (VRP), due to their enormous potential to efficiently generate high-quality solutions. However, neglecting the information between the constraints and the solutions makes previous approaches performance unideal in some strongly constrained problems, like VRPTW. We present the constraint-aware policy optimization (CPO) for VRPTW that can let the agent learn the constraints as a representation of the whole environment to improve the generalization of RL methods. Extensive experiments on both the Solomon benchmark and the generated datasets demonstrate that our approach significantly outperforms other competition methods.
更多
查看译文
关键词
Deep reinforcement Learning, Pointer network, Vehicle routing Problem, Gradient methods, Kullback-Leibler divergence, Constrained Markov Decision Processes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要