Performance of NPG in Countable State-Space Average-Cost RL
CoRR(2024)
Abstract
We consider policy optimization methods in reinforcement learning settings
where the state space is arbitrarily large, or even countably infinite. The
motivation arises from control problems in communication networks, matching
markets, and other queueing systems. We consider Natural Policy Gradient (NPG),
which is a popular algorithm for finite state spaces. Under reasonable
assumptions, we derive a performance bound for NPG that is independent of the
size of the state space, provided the error in policy evaluation is within a
factor of the true value function. We obtain this result by establishing new
policy-independent bounds on the solution to Poisson's equation, i.e., the
relative value function, and by combining these bounds with previously known
connections between MDPs and learning from experts.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined