Nonparametric stochastic compositional gradient descent for q-learning in continuous markov...

On the sample complexity of actor-critic method for reinforcement learning with function approximation

H Kumar, A Koppel, A Ribeiro - Machine Learning, 2023 - Springer

Reinforcement learning, mathematically described by Markov Decision Problems, may be
approached either through dynamic programming or policy search. Actor-critic algorithms …

被引用次数：109 相关文章所有 5 个版本

[PDF] arxiv.org

Stochastic policy gradient ascent in reproducing kernel hilbert spaces

S Paternain, JA Bazerque, A Small… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Reinforcement learning consists of finding policies that maximize an expected cumulative
long-term reward in a Markov decision process with unknown transition probabilities and …

被引用次数：26 相关文章所有 3 个版本

[PDF] nsf.gov

Convergence and iteration complexity of policy gradient method for infinite-horizon reinforcement learning

K Zhang, A Koppel, H Zhu… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org

We focus on policy search in reinforcement learning problems over continuous spaces,
where the value is defined by infinite-horizon discounted reward accumulation. This is the …

被引用次数：17 相关文章所有 4 个版本

[PDF] ieee.org

Policy evaluation in continuous MDPs with efficient kernelized gradient temporal difference

A Koppel, G Warnell, E Stump, P Stone… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

We consider policy evaluation in infinite-horizon discounted Markov decision problems with
continuous compact state and action spaces. We reformulate this task as a compositional …

被引用次数：17 相关文章所有 5 个版本

[PDF] arxiv.org

Policy gradient using weak derivatives for reinforcement learning

S Bhatt, A Koppel… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org

This paper considers policy search in continuous state-action reinforcement learning
problems. Typically, one computes search directions using a classic expression for the …

被引用次数：11 相关文章所有 7 个版本

[PDF] arxiv.org

Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms

M Yang, X Wei, T Yang, Y Ying - arXiv preprint arXiv:2307.03357, 2023 - arxiv.org

Many machine learning tasks can be formulated as a stochastic compositional optimization
(SCO) problem such as reinforcement learning, AUC maximization, and meta-learning …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Multi-task reinforcement learning in reproducing kernel hilbert spaces via cross-learning

J Cervino, JA Bazerque… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Reinforcement learning is a framework to optimize an agent's policy using rewards that are
revealed by the system as a response to an action. In its standard form, reinforcement …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Matrix low-rank approximation for policy gradient methods

S Rozada, AG Marques - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Estimating a policy that maps states to actions is a central problem in reinforcement learning.
Traditionally, policies are inferred from the so called value functions (VFs), but exact VF …

被引用次数：1 相关文章所有 3 个版本

UAV Local Path Planning Based on Improved Proximal Policy Optimization Algorithm

J Xu, X Yan, C Peng, X Wu, L Gu… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Recently, more and more researchers have used deep reinforcement learning (DRL) to
solve the UAV local path planning problem. However, existing DRL didn't consider the …

被引用次数：1 相关文章

[PDF] researchgate.net

[PDF][PDF] Nonparametric compositional stochastic optimization

AS Bedi, A Koppel, K Rajawat - arXiv preprint arXiv:1902.06011, 2019 - researchgate.net

In this work, we address optimization problems where the objective function is a nonlinear
function of an expected value, ie, compositional stochastic strongly convex programs. We …

被引用次数：10 相关文章所有 2 个版本

高级搜索

QQ 群