On the sample complexity of actor-critic method for reinforcement learning with function approximation

H Kumar, A Koppel, A Ribeiro - Machine Learning, 2023 - Springer
Reinforcement learning, mathematically described by Markov Decision Problems, may be
approached either through dynamic programming or policy search. Actor-critic algorithms …

Stochastic policy gradient ascent in reproducing kernel hilbert spaces

S Paternain, JA Bazerque, A Small… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Reinforcement learning consists of finding policies that maximize an expected cumulative
long-term reward in a Markov decision process with unknown transition probabilities and …

Convergence and iteration complexity of policy gradient method for infinite-horizon reinforcement learning

K Zhang, A Koppel, H Zhu… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org
We focus on policy search in reinforcement learning problems over continuous spaces,
where the value is defined by infinite-horizon discounted reward accumulation. This is the …

Policy evaluation in continuous MDPs with efficient kernelized gradient temporal difference

A Koppel, G Warnell, E Stump, P Stone… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
We consider policy evaluation in infinite-horizon discounted Markov decision problems with
continuous compact state and action spaces. We reformulate this task as a compositional …

Policy gradient using weak derivatives for reinforcement learning

S Bhatt, A Koppel… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org
This paper considers policy search in continuous state-action reinforcement learning
problems. Typically, one computes search directions using a classic expression for the …

Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms

M Yang, X Wei, T Yang, Y Ying - arXiv preprint arXiv:2307.03357, 2023 - arxiv.org
Many machine learning tasks can be formulated as a stochastic compositional optimization
(SCO) problem such as reinforcement learning, AUC maximization, and meta-learning …

Multi-task reinforcement learning in reproducing kernel hilbert spaces via cross-learning

J Cervino, JA Bazerque… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Reinforcement learning is a framework to optimize an agent's policy using rewards that are
revealed by the system as a response to an action. In its standard form, reinforcement …

Matrix low-rank approximation for policy gradient methods

S Rozada, AG Marques - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Estimating a policy that maps states to actions is a central problem in reinforcement learning.
Traditionally, policies are inferred from the so called value functions (VFs), but exact VF …

UAV Local Path Planning Based on Improved Proximal Policy Optimization Algorithm

J Xu, X Yan, C Peng, X Wu, L Gu… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Recently, more and more researchers have used deep reinforcement learning (DRL) to
solve the UAV local path planning problem. However, existing DRL didn't consider the …

[PDF][PDF] Nonparametric compositional stochastic optimization

AS Bedi, A Koppel, K Rajawat - arXiv preprint arXiv:1902.06011, 2019 - researchgate.net
In this work, we address optimization problems where the objective function is a nonlinear
function of an expected value, ie, compositional stochastic strongly convex programs. We …