J He, H Zhong, Z Yang - arXiv preprint arXiv:2404.12648, 2024 - arxiv.org
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. Specifically, we propose a novel algorithmic …
T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org
We study average-reward Markov decision processes (AMDPs) and develop novel first- order methods with strong theoretical guarantees for both policy optimization and policy …
We present two Policy Gradient-based methods with general parameterization in the context of infinite horizon average reward Markov Decision Processes. The first approach employs …
P Agrawal, S Agrawal - arXiv preprint arXiv:2407.13743, 2024 - arxiv.org
We present an optimistic Q-learning algorithm for regret minimization in average reward reinforcement learning under an additional assumption on the underlying MDP that for all …
Reinforcement learning utilizing kernel ridge regression to predict the expected value function represents a powerful method with great representational capacity. This setting is a …
Reinforcement Learning (RL) serves as a versatile framework for sequential decision- making, finding applications across diverse domains such as robotics, autonomous driving …
W Chae, K Hong, Y Zhang, A Tewari, D Lee - arXiv preprint arXiv …, 2024 - arxiv.org
This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear mixture Markov decision processes (MDPs) under the Bellman …
K Hong, Y Zhang, A Tewari - arXiv preprint arXiv:2405.15050, 2024 - arxiv.org
We resolve the open problem of designing a computationally efficient algorithm for infinite- horizon average-reward linear Markov Decision Processes (MDPs) with $\widetilde {O}(\sqrt …
In this paper, we investigate the concentration properties of cumulative rewards in Markov Decision Processes (MDPs), focusing on both asymptotic and non-asymptotic settings. We …