Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence

D Ding, CY Wei, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press
We examine global non-asymptotic convergence properties of policy gradient methods for
multi-agent reinforcement learning (RL) problems in Markov potential games (MPGs). To …

Global convergence of multi-agent policy gradient in markov potential games

S Leonardos, W Overman, I Panageas… - arXiv preprint arXiv …, 2021 - arxiv.org
Potential games are arguably one of the most important and widely studied classes of
normal form games. They define the archetypal setting of multi-agent coordination as all …

V-Learning--A Simple, Efficient, Decentralized Algorithm for Multiagent RL

C Jin, Q Liu, Y Wang, T Yu - arXiv preprint arXiv:2110.14555, 2021 - arxiv.org
A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents,
where the size of the joint action space scales exponentially with the number of agents. This …

Independent learning in stochastic games

A Ozdaglar, MO Sayin, K Zhang - International Congress of …, 2021 - ems.press
Reinforcement learning (RL) has recently achieved tremendous successes in many artificial
intelligence applications. Many of the forefront applications of RL involve multiple agents …

The complexity of markov equilibrium in stochastic games

C Daskalakis, N Golowich… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We show that computing approximate stationary Markov coarse correlated equilibria (CCE)
in general-sum stochastic games is PPAD-hard, even when there are two players, the game …

Decentralized Q-learning in zero-sum Markov games

M Sayin, K Zhang, D Leslie, T Basar… - Advances in Neural …, 2021 - proceedings.neurips.cc
We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum
Markov games. We focus on the practical but challenging setting of decentralized MARL …

A minimaximalist approach to reinforcement learning from human feedback

G Swamy, C Dann, R Kidambi, ZS Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement
learning from human feedback. Our approach is minimalist in that it does not require training …

Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive markov games

CY Wei, CW Lee, M Zhang… - Conference on learning …, 2021 - proceedings.mlr.press
We study infinite-horizon discounted two-player zero-sum Markov games, and develop a
decentralized algorithm that provably converges to the set of Nash equilibria under self-play …

Efficient methods for structured nonconvex-nonconcave min-max optimization

J Diakonikolas, C Daskalakis… - … Conference on Artificial …, 2021 - proceedings.mlr.press
The use of min-max optimization in the adversarial training of deep neural network
classifiers, and the training of generative adversarial networks has motivated the study of …

On improving model-free algorithms for decentralized multi-agent reinforcement learning

W Mao, L Yang, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press
Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential
sample complexity dependence on the number of agents, a phenomenon known as the …