Trust region bounds for decentralized ppo under non-stationarity

B Ellis, J Cook, S Moalla… - Advances in …, 2024 - proceedings.neurips.cc

The availability of challenging benchmarks has played a key role in the recent progress of
machine learning. In cooperative multi-agent reinforcement learning, the StarCraft Multi …

被引用次数：85 相关文章所有 6 个版本

[PDF] arxiv.org

Jaxmarl: Multi-agent rl environments in jax

A Rutherford, B Ellis, M Gallici, J Cook, A Lupu… - arXiv preprint arXiv …, 2023 - arxiv.org

Benchmarks play an important role in the development of machine learning algorithms. For
example, research in reinforcement learning (RL) has been heavily influenced by available …

被引用次数：30 相关文章所有 4 个版本

[PDF] aalto.fi

Optimistic multi-agent policy gradient

W Zhao, Y Zhao, Z Li, J Kannala… - Proceedings of Machine …, 2024 - research.aalto.fi

Relative overgeneralization (RO) occurs in cooperative multi-agent learning tasks when
agents converge towards a suboptimal joint policy due to overfitting to suboptimal behaviors …

[PDF] arxiv.org

Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing

Y Jin, S Wei, G Montana - arXiv preprint arXiv:2412.12326, 2024 - arxiv.org

In human society, the conflict between self-interest and collective well-being often obstructs
efforts to achieve shared welfare. Related concepts like the Tragedy of the Commons and …

Configurable Mirror Descent: Towards a Unification of Decision Making

P Li, S Li, C Yang, X Wang, S Hu, X Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

Decision-making problems, categorized as single-agent, eg, Atari, cooperative multi-agent,
eg, Hanabi, competitive multi-agent, eg, Hold'em poker, and mixed cooperative and …

被引用次数：1 相关文章所有 3 个版本

[PDF] kcl.ac.uk

[PDF][PDF] Off-agent trust region policy optimization

R Chen, X Zhang, Y Du, Y Zhong, Z Tian… - … Joint Conference on …, 2024 - kclpure.kcl.ac.uk

Leveraging the experiences of other agents offers a powerful mechanism to enhance policy
optimization in multi-agent reinforcement learning (MARL). However, contemporary MARL …

Safe Multiagent Coordination via Entropic Exploration

AA Aydeniz, E Marchesini, R Loftin, C Amato… - arXiv preprint arXiv …, 2024 - arxiv.org

Many real-world multiagent learning problems involve safety concerns. In these setups,
typical safe reinforcement learning algorithms constrain agents' behavior, limiting …

[PDF] arxiv.org

FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility

L Feng, D Xing, J Zhang, G Pan - arXiv preprint arXiv:2310.05053, 2023 - arxiv.org

Existing multi-agent PPO algorithms lack compatibility with different types of parameter
sharing when extending the theoretical guarantee of PPO to cooperative multi-agent …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Toward Finding Strong Pareto Optimal Policies in Multi-Agent Reinforcement Learning

BG Le, VC Ta - arXiv preprint arXiv:2410.19372, 2024 - arxiv.org

In this work, we study the problem of finding Pareto optimal policies in multi-agent
reinforcement learning problems with cooperative reward structures. We show that any …

Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games

N Liu, M Wang, Y Zhang, Y Yang, B An… - arXiv preprint arXiv …, 2024 - arxiv.org

Two-team zero-sum games are one of the most important paradigms in game theory. In this
paper, we focus on finding an unexploitable equilibrium in large team games. An …

高级搜索

QQ 群