Minimizing the Outage Probability in a Markov Decision Process

arxiv(2023)

引用 0|浏览2
暂无评分
摘要
Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is greater than a given value. The algorithm can be seen as an extension of the value iteration algorithm. We also show how the proposed algorithm could be generalized to use neural networks, similarly to the deep Q learning extension of Q learning.
更多
查看译文
关键词
Markov decision process,value iteration,neural network,outage probability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要