A structure-aware online learning algorithm for Markov decision processes

A Roy, V Borkar, A Karandikar… - Proceedings Of The 12th …, 2019 - dl.acm.org
A Roy, V Borkar, A Karandikar, P Chaporkar
Proceedings Of The 12th EAI international conference on performance …, 2019dl.acm.org
To overcome the curse of dimensionality and curse of modeling in Dynamic Programming
(DP) methods for solving classical Markov Decision Process (MDP) problems,
Reinforcement Learning (RL) algorithms are popular. In this paper, we consider an infinite-
horizon average reward MDP problem and prove the optimality of the threshold policy under
certain conditions. Traditional RL techniques do not exploit the threshold nature of optimal
policy while learning. We propose a new RL algorithm which utilizes the known threshold …
To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider an infinite-horizon average reward MDP problem and prove the optimality of the threshold policy under certain conditions. Traditional RL techniques do not exploit the threshold nature of optimal policy while learning. We propose a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible policy space. We establish that the proposed algorithm converges to the optimal policy. It provides a significant improvement in convergence speed and computational and storage complexity over traditional RL algorithms. The proposed technique can be applied to a wide variety of optimization problems that include energy efficient data transmission and management of queues. We exhibit the improvement in convergence speed of the proposed algorithm over other RL algorithms through simulations.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References