相关文章- 学术资源搜索

A sharp analysis of model-based reinforcement learning with self-play

Q Liu, T Yu, Y Bai, C Jin - International Conference on …, 2021 - proceedings.mlr.press

Abstract Model-based algorithms—algorithms that explore the environment through building
and utilizing an estimated model—are widely used in reinforcement learning practice and …

被引用次数：148 相关文章所有 6 个版本

[PDF] mlr.press

Provable self-play algorithms for competitive reinforcement learning

Y Bai, C Jin - International conference on machine learning, 2020 - proceedings.mlr.press

Self-play, where the algorithm learns by playing against itself without requiring any direct
supervision, has become the new weapon in modern Reinforcement Learning (RL) for …

被引用次数：179 相关文章所有 6 个版本

[PDF] mlr.press

Sharper model-free reinforcement learning for average-reward markov decision processes

Z Zhang, Q Xie - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press

We study model-free reinforcement learning (RL) algorithms for infinite-horizon average-
reward Markov decision process (MDP), which is more appropriate for applications that …

被引用次数：9 相关文章所有 4 个版本

[PDF] neurips.cc

Policy optimization for markov games: Unified framework and faster convergence

R Zhang, Q Liu, H Wang, C Xiong… - Advances in Neural …, 2022 - proceedings.neurips.cc

This paper studies policy optimization algorithms for multi-agent reinforcement learning. We
begin by proposing an algorithm framework for two-player zero-sum Markov Games in the …

被引用次数：31 相关文章所有 6 个版本

[PDF] neurips.cc

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

被引用次数：147 相关文章所有 12 个版本

[PDF] neurips.cc

When are offline two-player zero-sum Markov games solvable?

Q Cui, SS Du - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

We study what dataset assumption permits solving offline two-player zero-sum Markov
games. In stark contrast to the offline single-agent Markov decision process, we show that …

被引用次数：51 相关文章所有 9 个版本

[PDF] neurips.cc

Actor-critic policy optimization in partially observable multiagent environments

S Srinivasan, M Lanctot, V Zambaldi… - Advances in neural …, 2018 - proceedings.neurips.cc

Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …

被引用次数：164 相关文章所有 9 个版本

[PDF] neurips.cc

Near-optimal reinforcement learning with self-play

Y Bai, C Jin, T Yu - Advances in neural information …, 2020 - proceedings.neurips.cc

This paper considers the problem of designing optimal algorithms for reinforcement learning
in two-player zero-sum games. We focus on self-play algorithms which learn the optimal …

被引用次数：149 相关文章所有 8 个版本

[PDF] neurips.cc

Provably efficient offline multi-agent reinforcement learning via strategy-wise bonus

Q Cui, SS Du - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

This paper considers offline multi-agent reinforcement learning. We propose the strategy-
wise concentration principle which directly builds a confidence interval for the joint strategy …

被引用次数：25 相关文章所有 6 个版本

[PDF] mlr.press

Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium

Q Xie, Y Chen, Z Wang, Z Yang - Conference on learning …, 2020 - proceedings.mlr.press

In this work, we develop provably efficient reinforcement learning algorithms for two-player
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …

被引用次数：153 相关文章所有 6 个版本

高级搜索

QQ 群

A sharp analysis of model-based reinforcement learning with self-play

Provable self-play algorithms for competitive reinforcement learning

Sharper model-free reinforcement learning for average-reward markov decision processes

Policy optimization for markov games: Unified framework and faster convergence

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

When are offline two-player zero-sum Markov games solvable?

Actor-critic policy optimization in partially observable multiagent environments

Near-optimal reinforcement learning with self-play

Provably efficient offline multi-agent reinforcement learning via strategy-wise bonus

Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium

相关搜索

引用