Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Decentralized multi-agent reinforcement learning with networked agents: Recent advances

K Zhang, Z Yang, T Başar - Frontiers of Information Technology & …, 2021 - Springer
Multi-agent reinforcement learning (MARL) has long been a significant research topic in
both machine learning and control systems. Recent development of (single-agent) deep …

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press
Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

SBEED: Convergent reinforcement learning with nonlinear function approximation

B Dai, A Shaw, L Li, L Xiao, N He… - International …, 2018 - proceedings.mlr.press
When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …

Finite-sample analysis for sarsa with linear function approximation

S Zou, T Xu, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc
SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …

Off-policy evaluation via the regularized lagrangian

M Yang, O Nachum, B Dai, L Li… - Advances in Neural …, 2020 - proceedings.neurips.cc
The recently proposed distribution correction estimation (DICE) family of estimators has
advanced the state of the art in off-policy evaluation from behavior-agnostic data. While …

Multi-agent reinforcement learning via double averaging primal-dual optimization

HT Wai, Z Yang, Z Wang… - Advances in Neural …, 2018 - proceedings.neurips.cc
Despite the success of single-agent reinforcement learning, multi-agent reinforcement
learning (MARL) remains challenging due to complex interactions between agents …

Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost

Z Yang, Y Chen, M Hong… - Advances in neural …, 2019 - proceedings.neurips.cc
Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags
behind. In a broader context, actor-critic can be viewed as an online alternating update …