Independent policy gradient methods for competitive reinforcement learning

Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence

D Ding, CY Wei, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press

We examine global non-asymptotic convergence properties of policy gradient methods for
multi-agent reinforcement learning (RL) problems in Markov potential games (MPGs). To …

被引用次数：67 相关文章所有 8 个版本

[PDF] arxiv.org

Global convergence of multi-agent policy gradient in markov potential games

S Leonardos, W Overman, I Panageas… - arXiv preprint arXiv …, 2021 - arxiv.org

Potential games are arguably one of the most important and widely studied classes of
normal form games. They define the archetypal setting of multi-agent coordination as all …

被引用次数：118 相关文章所有 7 个版本

[PDF] arxiv.org

V-Learning--A Simple, Efficient, Decentralized Algorithm for Multiagent RL

C Jin, Q Liu, Y Wang, T Yu - arXiv preprint arXiv:2110.14555, 2021 - arxiv.org

A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents,
where the size of the joint action space scales exponentially with the number of agents. This …

被引用次数：97 相关文章所有 3 个版本

[PDF] arxiv.org

Independent learning in stochastic games

A Ozdaglar, MO Sayin, K Zhang - International Congress of …, 2021 - ems.press

Reinforcement learning (RL) has recently achieved tremendous successes in many artificial
intelligence applications. Many of the forefront applications of RL involve multiple agents …

被引用次数：27 相关文章所有 5 个版本

[PDF] mlr.press

The complexity of markov equilibrium in stochastic games

C Daskalakis, N Golowich… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We show that computing approximate stationary Markov coarse correlated equilibria (CCE)
in general-sum stochastic games is PPAD-hard, even when there are two players, the game …

被引用次数：65 相关文章所有 6 个版本

[PDF] neurips.cc

Decentralized Q-learning in zero-sum Markov games

M Sayin, K Zhang, D Leslie, T Basar… - Advances in Neural …, 2021 - proceedings.neurips.cc

We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum
Markov games. We focus on the practical but challenging setting of decentralized MARL …

被引用次数：91 相关文章所有 8 个版本

[PDF] arxiv.org

A minimaximalist approach to reinforcement learning from human feedback

G Swamy, C Dann, R Kidambi, ZS Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement
learning from human feedback. Our approach is minimalist in that it does not require training …

被引用次数：27 相关文章所有 3 个版本

[PDF] mlr.press

Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive markov games

CY Wei, CW Lee, M Zhang… - Conference on learning …, 2021 - proceedings.mlr.press

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a
decentralized algorithm that provably converges to the set of Nash equilibria under self-play …

被引用次数：97 相关文章所有 4 个版本

[PDF] mlr.press

Efficient methods for structured nonconvex-nonconcave min-max optimization

J Diakonikolas, C Daskalakis… - … Conference on Artificial …, 2021 - proceedings.mlr.press

The use of min-max optimization in the adversarial training of deep neural network
classifiers, and the training of generative adversarial networks has motivated the study of …

被引用次数：146 相关文章所有 7 个版本

[PDF] mlr.press

On improving model-free algorithms for decentralized multi-agent reinforcement learning

W Mao, L Yang, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press

Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential
sample complexity dependence on the number of agents, a phenomenon known as the …

被引用次数：55 相关文章所有 5 个版本

高级搜索

QQ 群