Policy optimization over general state and action spaces

C Ju, G Lan - arXiv preprint arXiv:2211.16715, 2022 - arxiv.org
Reinforcement learning (RL) problems over general state and action spaces are notoriously
challenging. In contrast to the tableau setting, one can not enumerate all the states and then …

Policy mirror descent inherently explores action space

Y Li, G Lan - SIAM Journal on Optimization, 2025 - SIAM
Explicit exploration in the action space was assumed to be indispensable for online policy
gradient methods to avoid a drastic degradation in sample complexity, for solving general …

Stochastic first-order methods for average-reward markov decision processes

T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org
We study average-reward Markov decision processes (AMDPs) and develop novel first-
order methods with strong theoretical guarantees for both policy optimization and policy …

Stochastic gradient succeeds for bandits

J Mei, Z Zhong, B Dai, A Agarwal… - International …, 2023 - proceedings.mlr.press
We show that the stochastic gradient bandit algorithm converges to a globally optimal policy
at an $ O (1/t) $ rate, even with a constant step size. Remarkably, global convergence of the …

[PDF][PDF] Federated natural policy gradient methods for multi-task reinforcement learning

T Yang, S Cen, Y Wei, Y Chen… - arXiv preprint arXiv …, 2023 - yuxinchen2020.github.io
Federated reinforcement learning (RL) enables collaborative decision making of multiple
distributed agents without sharing local data trajectories. In this work, we consider a multi …

Dual Approximation Policy Optimization

Z Xiong, M Fazel, L Xiao - arXiv preprint arXiv:2410.01249, 2024 - arxiv.org
We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates
general function approximation into policy mirror descent methods. In contrast to the popular …

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

S Cen, Y Chi - arXiv preprint arXiv:2310.05230, 2023 - arxiv.org
Policy gradient methods, where one searches for the policy of interest by maximizing the
value functions using first-order information, become increasingly popular for sequential …

On the Convergence of Policy in Unregularized Policy Mirror Descent

D Lin, Z Zhang - arXiv preprint arXiv:2205.08176, 2022 - arxiv.org
In this short note, we give the convergence analysis of the policy in the recent famous policy
mirror descent (PMD). We mainly consider the unregularized setting following [11] with …

[PDF][PDF] SIAG on Optimization Views and News

H Lu, S Cen, Y Chi, LN Vicente, H Lu - siagoptimization.github.io
Linear programming (LP)[26, 17, 15, 51, 50, 33] is a seminal optimization problem that has
grown with today's rich and diverse optimization modeling and algorithmic landscape. LP is …