Block policy mirror descent

C Ju, G Lan - arXiv preprint arXiv:2211.16715, 2022 - arxiv.org

Reinforcement learning (RL) problems over general state and action spaces are notoriously
challenging. In contrast to the tableau setting, one can not enumerate all the states and then …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Policy mirror descent inherently explores action space

Y Li, G Lan - SIAM Journal on Optimization, 2025 - SIAM

Explicit exploration in the action space was assumed to be indispensable for online policy
gradient methods to avoid a drastic degradation in sample complexity, for solving general …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Stochastic first-order methods for average-reward markov decision processes

T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org

We study average-reward Markov decision processes (AMDPs) and develop novel first-
order methods with strong theoretical guarantees for both policy optimization and policy …

被引用次数：13 相关文章所有 2 个版本

[PDF] mlr.press

Stochastic gradient succeeds for bandits

J Mei, Z Zhong, B Dai, A Agarwal… - International …, 2023 - proceedings.mlr.press

We show that the stochastic gradient bandit algorithm converges to a globally optimal policy
at an $ O (1/t) $ rate, even with a constant step size. Remarkably, global convergence of the …

被引用次数：5 相关文章所有 8 个版本

[PDF] github.io

[PDF][PDF] Federated natural policy gradient methods for multi-task reinforcement learning

T Yang, S Cen, Y Wei, Y Chen… - arXiv preprint arXiv …, 2023 - yuxinchen2020.github.io

Federated reinforcement learning (RL) enables collaborative decision making of multiple
distributed agents without sharing local data trajectories. In this work, we consider a multi …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

[PDF][PDF] SIAG on Optimization Views and News

H Lu, S Cen, Y Chi, LN Vicente, H Lu - siagoptimization.github.io

Linear programming (LP)[26, 17, 15, 51, 50, 33] is a seminal optimization problem that has
grown with today's rich and diverse optimization modeling and algorithmic landscape. LP is …

高级搜索

QQ 群

Policy optimization over general state and action spaces

Policy mirror descent inherently explores action space

Stochastic first-order methods for average-reward markov decision processes

Stochastic gradient succeeds for bandits

[PDF][PDF] Federated natural policy gradient methods for multi-task reinforcement learning

Dual Approximation Policy Optimization

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

On the Convergence of Policy in Unregularized Policy Mirror Descent

[PDF][PDF] SIAG on Optimization Views and News

引用