A generalized training approach for multiagent learning

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

被引用次数：349 相关文章所有 2 个版本

[PDF] arxiv.org

Self-play fine-tuning converts weak language models to strong language models

Z Chen, Y Deng, H Yuan, K Ji, Q Gu - arXiv preprint arXiv:2401.01335, 2024 - arxiv.org

Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …

被引用次数：185 相关文章所有 3 个版本

[PDF] science.org Full View

The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning

S Zheng, A Trott, S Srinivasa, DC Parkes, R Socher - Science advances, 2022 - science.org

Artificial intelligence (AI) and reinforcement learning (RL) have improved many areas but are
not yet widely adopted in economic policy design, mechanism design, or economics at …

被引用次数：165 相关文章所有 10 个版本

[PDF] arxiv.org

Language agents with reinforcement learning for strategic play in the werewolf game

Z Xu, C Yu, F Fang, Y Wang, Y Wu - arXiv preprint arXiv:2310.18940, 2023 - arxiv.org

Agents built with large language models (LLMs) have recently achieved great
advancements. However, most of the efforts focus on single-agent or cooperative settings …

被引用次数：62 相关文章所有 4 个版本

[PDF] springer.com

Distributed deep reinforcement learning: A survey and a multi-player multi-agent learning toolbox

Q Yin, T Yu, S Shen, J Yang, M Zhao, W Ni… - Machine Intelligence …, 2024 - Springer

With the breakthrough of AlphaGo, deep reinforcement learning has become a recognized
technique for solving sequential decision-making problems. Despite its reputation, data …

被引用次数：17 相关文章所有 5 个版本

[PDF] mlr.press

Modelling behavioural diversity for learning in open-ended games

N Perez-Nieves, Y Yang, O Slumbers… - International …, 2021 - proceedings.mlr.press

Promoting behavioural diversity is critical for solving games with non-transitive dynamics
where strategic cycles exist, and there is no consistent winner (eg, Rock-Paper-Scissors) …

被引用次数：71 相关文章所有 5 个版本

[PDF] neurips.cc

Pipeline psro: A scalable approach for finding approximate nash equilibria in large games

S McAleer, JB Lanier, R Fox… - Advances in neural …, 2020 - proceedings.neurips.cc

Finding approximate Nash equilibria in zero-sum imperfect-information games is
challenging when the number of information states is large. Policy Space Response Oracles …

被引用次数：87 相关文章所有 13 个版本

[PDF] neurips.cc

Team-PSRO for learning approximate TMECor in large team games via cooperative reinforcement learning

S McAleer, G Farina, G Zhou, M Wang… - Advances in …, 2023 - proceedings.neurips.cc

Recent algorithms have achieved superhuman performance at a number of two-player zero-
sum games such as poker and go. However, many real-world situations are multi-player …

被引用次数：9 相关文章所有 3 个版本

[PDF] mlr.press

Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers

L Marris, P Muller, M Lanctot, K Tuyls… - … on Machine Learning, 2021 - proceedings.mlr.press

Two-player, constant-sum games are well studied in the literature, but there has been limited
progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO) …

被引用次数：44 相关文章所有 6 个版本

[PDF] jmlr.org

Strategic knowledge transfer

MO Smith, T Anthony, MP Wellman - Journal of Machine Learning …, 2023 - jmlr.org

In the course of playing or solving a game, it is common to face a series of changing other-
agent strategies. These strategies often share elements: the set of possible policies to play …

被引用次数：3 相关文章

高级搜索

QQ 群