- 学术资源搜索

Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

被引用次数：70 相关文章所有 7 个版本

[PDF] neurips.cc

Improving sample complexity bounds for (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - Advances in Neural Information …, 2020 - proceedings.neurips.cc

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement
learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and …

被引用次数：120 相关文章所有 8 个版本

[PDF] mlr.press

Gradientdice: Rethinking generalized offline estimation of stationary values

S Zhang, B Liu, S Whiteson - International Conference on …, 2020 - proceedings.mlr.press

We present GradientDICE for estimating the density ratio between the state distribution of
the target policy and the sampling distribution in off-policy reinforcement learning …

被引用次数：110 相关文章所有 8 个版本

[PDF] mlr.press

Breaking the deadly triad with a target network

S Zhang, H Yao, S Whiteson - International Conference on …, 2021 - proceedings.mlr.press

The deadly triad refers to the instability of a reinforcement learning algorithm when it
employs off-policy learning, function approximation, and bootstrapping simultaneously. In …

被引用次数：53 相关文章所有 7 个版本

[PDF] mlr.press

State dependent performative prediction with stochastic approximation

Q Li, HT Wai - International Conference on Artificial …, 2022 - proceedings.mlr.press

This paper studies the performative prediction problem which optimizes a stochastic loss
function with data distribution that depends on the decision variable. We consider a setting …

被引用次数：39 相关文章所有 3 个版本

[PDF] arxiv.org

Finite-sample analysis of two-time-scale natural actor–critic algorithm

S Khodadadian, TT Doan, J Romberg… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Actor–critic style two-time-scale algorithms are one of the most popular methods in
reinforcement learning, and have seen great empirical success. However, their performance …

被引用次数：48 相关文章所有 5 个版本

[PDF] mlr.press

Finite-sample analysis of off-policy natural actor-critic algorithm

S Khodadadian, Z Chen… - … Conference on Machine …, 2021 - proceedings.mlr.press

In this paper, we provide finite-sample convergence guarantees for an off-policy variant of
the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we …

被引用次数：37 相关文章所有 4 个版本

[PDF] arxiv.org

Finite-sample analysis of off-policy natural actor–critic with linear function approximation

Z Chen, S Khodadadian… - IEEE Control Systems …, 2022 - ieeexplore.ieee.org

In this letter, we develop a novel variant of natural actor-critic algorithm using off-policy
sampling and linear function approximation, and establish a sample complexity of …

被引用次数：38 相关文章所有 4 个版本

[PDF] mlr.press

Average-reward off-policy policy evaluation with function approximation

S Zhang, Y Wan, RS Sutton… - … conference on machine …, 2021 - proceedings.mlr.press

We consider off-policy policy evaluation with function approximation (FA) in average-reward
MDPs, where the goal is to estimate both the reward rate and the differential value function …

被引用次数：39 相关文章所有 8 个版本

[PDF] neurips.cc

Finite-time analysis of single-timescale actor-critic

X Chen, L Zhao - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Actor-critic methods have achieved significant success in many challenging applications.
However, its finite-time convergence is still poorly understood in the most practical single …

被引用次数：16 相关文章所有 8 个版本

高级搜索

QQ 群

Federated reinforcement learning: Linear speedup under markovian sampling

Improving sample complexity bounds for (natural) actor-critic algorithms

Gradientdice: Rethinking generalized offline estimation of stationary values

Breaking the deadly triad with a target network

State dependent performative prediction with stochastic approximation

Finite-sample analysis of two-time-scale natural actor–critic algorithm

Finite-sample analysis of off-policy natural actor-critic algorithm

Finite-sample analysis of off-policy natural actor–critic with linear function approximation

Average-reward off-policy policy evaluation with function approximation

Finite-time analysis of single-timescale actor-critic

引用