C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies. However, while theoretical guarantees …
H Tian, A Olshevsky… - Advances in neural …, 2024 - proceedings.neurips.cc
The early theory of actor-critic methods considered convergence using linear function approximators for the policy and value functions. Recent work has established convergence …
AD Kara, S Yuksel - arXiv preprint arXiv:2412.06735, 2024 - arxiv.org
In this review/tutorial article, we present recent progress on optimal control of partially observed Markov Decision Processes (POMDPs). We first present regularity and continuity …
The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs …
Actor-critic (AC) methods are widely used in reinforcement learning (RL), and benefit from the flexibility of using any policy gradient method as the actor and value-based method as …
In this article, we study the dynamics of temporal-difference (TD) learning with neural network-based value function approximation over a general state space, namely, neural TD …
Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning, where the critic uses algorithms, eg, temporal difference (TD) learning with function …
Policy gradient methods equipped with deep neural networks have achieved great success in solving high-dimensional reinforcement learning (RL) problems. However, current …
S Cayci, A Eryilmaz - arXiv preprint arXiv:2405.18221, 2024 - arxiv.org
In this paper, we study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for …