Provable policy gradient methods for average-reward markov potential games

M Cheng, R Zhou, PR Kumar… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We study Markov potential games under the infinite horizon average reward criterion. Most
previous studies have been for discounted rewards. We prove that both algorithms based on …

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

J He, H Zhong, Z Yang - arXiv preprint arXiv:2404.12648, 2024 - arxiv.org
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the
context of general function approximation. Specifically, we propose a novel algorithmic …

Stochastic first-order methods for average-reward markov decision processes

T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org
We study average-reward Markov decision processes (AMDPs) and develop novel first-
order methods with strong theoretical guarantees for both policy optimization and policy …

Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes

S Ganesh, WU Mondal, V Aggarwal - arXiv preprint arXiv:2404.02108, 2024 - arxiv.org
We present two Policy Gradient-based methods with general parameterization in the context
of infinite horizon average reward Markov Decision Processes. The first approach employs …

Optimistic Q-learning for average reward and episodic reinforcement learning

P Agrawal, S Agrawal - arXiv preprint arXiv:2407.13743, 2024 - arxiv.org
We present an optimistic Q-learning algorithm for regret minimization in average reward
reinforcement learning under an additional assumption on the underlying MDP that for all …

Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret Algorithm

S Vakili, J Olkhovskaya - arXiv preprint arXiv:2410.23498, 2024 - arxiv.org
Reinforcement learning utilizing kernel ridge regression to predict the expected value
function represents a powerful method with great representational capacity. This setting is a …

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

V Aggarwal, WU Mondal, Q Bai - arXiv preprint arXiv:2406.11481, 2024 - arxiv.org
Reinforcement Learning (RL) serves as a versatile framework for sequential decision-
making, finding applications across diverse domains such as robotics, autonomous driving …

Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span

W Chae, K Hong, Y Zhang, A Tewari, D Lee - arXiv preprint arXiv …, 2024 - arxiv.org
This paper proposes a computationally tractable algorithm for learning infinite-horizon
average-reward linear mixture Markov decision processes (MDPs) under the Bellman …

Provably Efficient Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs

K Hong, Y Zhang, A Tewari - arXiv preprint arXiv:2405.15050, 2024 - arxiv.org
We resolve the open problem of designing a computationally efficient algorithm for infinite-
horizon average-reward linear Markov Decision Processes (MDPs) with $\widetilde {O}(\sqrt …

Concentration of Cumulative Reward in Markov Decision Processes

B Sayedana, PE Caines, A Mahajan - arXiv preprint arXiv:2411.18551, 2024 - arxiv.org
In this paper, we investigate the concentration properties of cumulative rewards in Markov
Decision Processes (MDPs), focusing on both asymptotic and non-asymptotic settings. We …