A sharp analysis of model-based reinforcement learning with self-play

Q Liu, T Yu, Y Bai, C Jin - International Conference on …, 2021 - proceedings.mlr.press
Abstract Model-based algorithms—algorithms that explore the environment through building
and utilizing an estimated model—are widely used in reinforcement learning practice and …

Provable self-play algorithms for competitive reinforcement learning

Y Bai, C Jin - International conference on machine learning, 2020 - proceedings.mlr.press
Self-play, where the algorithm learns by playing against itself without requiring any direct
supervision, has become the new weapon in modern Reinforcement Learning (RL) for …

Sharper model-free reinforcement learning for average-reward markov decision processes

Z Zhang, Q Xie - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
We study model-free reinforcement learning (RL) algorithms for infinite-horizon average-
reward Markov decision process (MDP), which is more appropriate for applications that …

Policy optimization for markov games: Unified framework and faster convergence

R Zhang, Q Liu, H Wang, C Xiong… - Advances in Neural …, 2022 - proceedings.neurips.cc
This paper studies policy optimization algorithms for multi-agent reinforcement learning. We
begin by proposing an algorithm framework for two-player zero-sum Markov Games in the …

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

When are offline two-player zero-sum Markov games solvable?

Q Cui, SS Du - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
We study what dataset assumption permits solving offline two-player zero-sum Markov
games. In stark contrast to the offline single-agent Markov decision process, we show that …

Actor-critic policy optimization in partially observable multiagent environments

S Srinivasan, M Lanctot, V Zambaldi… - Advances in neural …, 2018 - proceedings.neurips.cc
Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …

Near-optimal reinforcement learning with self-play

Y Bai, C Jin, T Yu - Advances in neural information …, 2020 - proceedings.neurips.cc
This paper considers the problem of designing optimal algorithms for reinforcement learning
in two-player zero-sum games. We focus on self-play algorithms which learn the optimal …

Provably efficient offline multi-agent reinforcement learning via strategy-wise bonus

Q Cui, SS Du - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
This paper considers offline multi-agent reinforcement learning. We propose the strategy-
wise concentration principle which directly builds a confidence interval for the joint strategy …

Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium

Q Xie, Y Chen, Z Wang, Z Yang - Conference on learning …, 2020 - proceedings.mlr.press
In this work, we develop provably efficient reinforcement learning algorithms for two-player
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …