T Xu, Z Wang, Y Liang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and …
S Zhang, B Liu, S Whiteson - International Conference on …, 2020 - proceedings.mlr.press
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning …
S Zhang, H Yao, S Whiteson - International Conference on …, 2021 - proceedings.mlr.press
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. In …
Q Li, HT Wai - International Conference on Artificial …, 2022 - proceedings.mlr.press
This paper studies the performative prediction problem which optimizes a stochastic loss function with data distribution that depends on the decision variable. We consider a setting …
Actor–critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance …
S Khodadadian, Z Chen… - … Conference on Machine …, 2021 - proceedings.mlr.press
In this paper, we provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we …
In this letter, we develop a novel variant of natural actor-critic algorithm using off-policy sampling and linear function approximation, and establish a sample complexity of …
S Zhang, Y Wan, RS Sutton… - … conference on machine …, 2021 - proceedings.mlr.press
We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function …
X Chen, L Zhao - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Actor-critic methods have achieved significant success in many challenging applications. However, its finite-time convergence is still poorly understood in the most practical single …