Y Li, G Lan - SIAM Journal on Optimization, 2025 - SIAM
Explicit exploration in the action space was assumed to be indispensable for online policy gradient methods to avoid a drastic degradation in sample complexity, for solving general …
T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org
We study average-reward Markov decision processes (AMDPs) and develop novel first- order methods with strong theoretical guarantees for both policy optimization and policy …
We show that the stochastic gradient bandit algorithm converges to a globally optimal policy at an $ O (1/t) $ rate, even with a constant step size. Remarkably, global convergence of the …
T Yang, S Cen, Y Wei, Y Chen… - arXiv preprint arXiv …, 2023 - yuxinchen2020.github.io
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi …
We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular …
S Cen, Y Chi - arXiv preprint arXiv:2310.05230, 2023 - arxiv.org
Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential …
D Lin, Z Zhang - arXiv preprint arXiv:2205.08176, 2022 - arxiv.org
In this short note, we give the convergence analysis of the policy in the recent famous policy mirror descent (PMD). We mainly consider the unregularized setting following [11] with …
H Lu, S Cen, Y Chi, LN Vicente, H Lu - siagoptimization.github.io
Linear programming (LP)[26, 17, 15, 51, 50, 33] is a seminal optimization problem that has grown with today's rich and diverse optimization modeling and algorithmic landscape. LP is …