Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press
Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

Improving sample complexity bounds for (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement
learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and …

Gradientdice: Rethinking generalized offline estimation of stationary values

S Zhang, B Liu, S Whiteson - International Conference on …, 2020 - proceedings.mlr.press
We present GradientDICE for estimating the density ratio between the state distribution of
the target policy and the sampling distribution in off-policy reinforcement learning …

Breaking the deadly triad with a target network

S Zhang, H Yao, S Whiteson - International Conference on …, 2021 - proceedings.mlr.press
The deadly triad refers to the instability of a reinforcement learning algorithm when it
employs off-policy learning, function approximation, and bootstrapping simultaneously. In …

State dependent performative prediction with stochastic approximation

Q Li, HT Wai - International Conference on Artificial …, 2022 - proceedings.mlr.press
This paper studies the performative prediction problem which optimizes a stochastic loss
function with data distribution that depends on the decision variable. We consider a setting …

Finite-sample analysis of two-time-scale natural actor–critic algorithm

S Khodadadian, TT Doan, J Romberg… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Actor–critic style two-time-scale algorithms are one of the most popular methods in
reinforcement learning, and have seen great empirical success. However, their performance …

Finite-sample analysis of off-policy natural actor-critic algorithm

S Khodadadian, Z Chen… - … Conference on Machine …, 2021 - proceedings.mlr.press
In this paper, we provide finite-sample convergence guarantees for an off-policy variant of
the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we …

Finite-sample analysis of off-policy natural actor–critic with linear function approximation

Z Chen, S Khodadadian… - IEEE Control Systems …, 2022 - ieeexplore.ieee.org
In this letter, we develop a novel variant of natural actor-critic algorithm using off-policy
sampling and linear function approximation, and establish a sample complexity of …

Average-reward off-policy policy evaluation with function approximation

S Zhang, Y Wan, RS Sutton… - … conference on machine …, 2021 - proceedings.mlr.press
We consider off-policy policy evaluation with function approximation (FA) in average-reward
MDPs, where the goal is to estimate both the reward rate and the differential value function …

Finite-time analysis of single-timescale actor-critic

X Chen, L Zhao - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Actor-critic methods have achieved significant success in many challenging applications.
However, its finite-time convergence is still poorly understood in the most practical single …