An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Self-play fine-tuning converts weak language models to strong language models

Z Chen, Y Deng, H Yuan, K Ji, Q Gu - arXiv preprint arXiv:2401.01335, 2024 - arxiv.org
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …

The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning

S Zheng, A Trott, S Srinivasa, DC Parkes, R Socher - Science advances, 2022 - science.org
Artificial intelligence (AI) and reinforcement learning (RL) have improved many areas but are
not yet widely adopted in economic policy design, mechanism design, or economics at …

Language agents with reinforcement learning for strategic play in the werewolf game

Z Xu, C Yu, F Fang, Y Wang, Y Wu - arXiv preprint arXiv:2310.18940, 2023 - arxiv.org
Agents built with large language models (LLMs) have recently achieved great
advancements. However, most of the efforts focus on single-agent or cooperative settings …

Distributed deep reinforcement learning: A survey and a multi-player multi-agent learning toolbox

Q Yin, T Yu, S Shen, J Yang, M Zhao, W Ni… - Machine Intelligence …, 2024 - Springer
With the breakthrough of AlphaGo, deep reinforcement learning has become a recognized
technique for solving sequential decision-making problems. Despite its reputation, data …

Modelling behavioural diversity for learning in open-ended games

N Perez-Nieves, Y Yang, O Slumbers… - International …, 2021 - proceedings.mlr.press
Promoting behavioural diversity is critical for solving games with non-transitive dynamics
where strategic cycles exist, and there is no consistent winner (eg, Rock-Paper-Scissors) …

Pipeline psro: A scalable approach for finding approximate nash equilibria in large games

S McAleer, JB Lanier, R Fox… - Advances in neural …, 2020 - proceedings.neurips.cc
Finding approximate Nash equilibria in zero-sum imperfect-information games is
challenging when the number of information states is large. Policy Space Response Oracles …

Team-PSRO for learning approximate TMECor in large team games via cooperative reinforcement learning

S McAleer, G Farina, G Zhou, M Wang… - Advances in …, 2023 - proceedings.neurips.cc
Recent algorithms have achieved superhuman performance at a number of two-player zero-
sum games such as poker and go. However, many real-world situations are multi-player …

Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers

L Marris, P Muller, M Lanctot, K Tuyls… - … on Machine Learning, 2021 - proceedings.mlr.press
Two-player, constant-sum games are well studied in the literature, but there has been limited
progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO) …

Strategic knowledge transfer

MO Smith, T Anthony, MP Wellman - Journal of Machine Learning …, 2023 - jmlr.org
In the course of playing or solving a game, it is common to face a series of changing other-
agent strategies. These strategies often share elements: the set of possible policies to play …