Finite-sample analysis of proximal gradient td algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

被引用次数：1453 相关文章所有 8 个版本

[PDF] arxiv.org

Decentralized multi-agent reinforcement learning with networked agents: Recent advances

K Zhang, Z Yang, T Başar - Frontiers of Information Technology & …, 2021 - Springer

Multi-agent reinforcement learning (MARL) has long been a significant research topic in
both machine learning and control systems. Recent development of (single-agent) deep …

被引用次数：92 相关文章所有 9 个版本

[PDF] neurips.cc

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

被引用次数：94 相关文章所有 10 个版本

[PDF] mlr.press

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

被引用次数：404 相关文章所有 11 个版本

[PDF] mlr.press

Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

被引用次数：58 相关文章所有 7 个版本

[PDF] mlr.press

SBEED: Convergent reinforcement learning with nonlinear function approximation

B Dai, A Shaw, L Li, L Xiao, N He… - International …, 2018 - proceedings.mlr.press

When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …

被引用次数：297 相关文章所有 8 个版本

[PDF] neurips.cc

Finite-sample analysis for sarsa with linear function approximation

S Zou, T Xu, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc

SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …

被引用次数：200 相关文章所有 9 个版本

[PDF] neurips.cc

Off-policy evaluation via the regularized lagrangian

M Yang, O Nachum, B Dai, L Li… - Advances in Neural …, 2020 - proceedings.neurips.cc

The recently proposed distribution correction estimation (DICE) family of estimators has
advanced the state of the art in off-policy evaluation from behavior-agnostic data. While …

被引用次数：109 相关文章所有 8 个版本

[PDF] neurips.cc

Multi-agent reinforcement learning via double averaging primal-dual optimization

HT Wai, Z Yang, Z Wang… - Advances in Neural …, 2018 - proceedings.neurips.cc

Despite the success of single-agent reinforcement learning, multi-agent reinforcement
learning (MARL) remains challenging due to complex interactions between agents …

被引用次数：201 相关文章所有 8 个版本

[PDF] neurips.cc

Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost

Z Yang, Y Chen, M Hong… - Advances in neural …, 2019 - proceedings.neurips.cc

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags
behind. In a broader context, actor-critic can be viewed as an online alternating update …

被引用次数：142 相关文章所有 9 个版本

高级搜索

QQ 群