An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

What are higher-order networks?

C Bick, E Gross, HA Harrington, MT Schaub - SIAM Review, 2023 - SIAM
Network-based modeling of complex systems and data using the language of graphs has
become an essential topic across a range of different disciplines. Arguably, this graph-based …

The mechanics of n-player differentiable games

D Balduzzi, S Racaniere, J Martens… - International …, 2018 - proceedings.mlr.press
The cornerstone underpinning deep learning is the guarantee that gradient descent on an
objective converges to local minima. Unfortunately, this guarantee fails in settings, such as …

Open-ended learning in symmetric zero-sum games

D Balduzzi, M Garnelo, Y Bachrach… - International …, 2019 - proceedings.mlr.press
Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of
agents, for example labeling them 'winner'and 'loser'. If the game is approximately transitive …

Hodge Laplacians on graphs

LH Lim - Siam Review, 2020 - SIAM
This is an elementary introduction to the Hodge Laplacian on a graph, a higher-order
generalization of the graph Laplacian. We will discuss basic properties including …

On last-iterate convergence beyond zero-sum games

I Anagnostides, I Panageas, G Farina… - International …, 2022 - proceedings.mlr.press
Most existing results about last-iterate convergence of learning dynamics are limited to two-
player zero-sum games, and only apply under rigid assumptions about what dynamics the …

α-Rank: Multi-Agent Evaluation by Evolution

S Omidshafiei, C Papadimitriou, G Piliouras, K Tuyls… - Scientific reports, 2019 - nature.com
We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation
and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical …

Modelling behavioural diversity for learning in open-ended games

N Perez-Nieves, Y Yang, O Slumbers… - International …, 2021 - proceedings.mlr.press
Promoting behavioural diversity is critical for solving games with non-transitive dynamics
where strategic cycles exist, and there is no consistent winner (eg, Rock-Paper-Scissors) …

Policy space diversity for non-transitive games

J Yao, W Liu, H Fu, Y Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Policy-Space Response Oracles (PSRO) is an influential algorithm framework for
approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous …

Re-evaluating evaluation

D Balduzzi, K Tuyls, J Perolat… - Advances in Neural …, 2018 - proceedings.neurips.cc
Progress in machine learning is measured by careful evaluation on problems of outstanding
common interest. However, the proliferation of benchmark suites and environments …