Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

Open problems in cooperative ai

A Dafoe, E Hughes, Y Bachrach, T Collins… - arXiv preprint arXiv …, 2020 - arxiv.org
Problems of cooperation--in which agents seek ways to jointly improve their welfare--are
ubiquitous and important. They can be found at scales ranging from our daily routines--such …

Douzero: Mastering doudizhu with self-play deep reinforcement learning

D Zha, J Xie, W Ma, S Zhang, X Lian… - … on machine learning, 2021 - proceedings.mlr.press
Games are abstractions of the real world, where artificial agents learn to compete and
cooperate with other agents. While significant achievements have been made in various …

Scalable evaluation of multi-agent reinforcement learning with melting pot

JZ Leibo, EA Dueñez-Guzman… - International …, 2021 - proceedings.mlr.press
Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess
generalization to novel situations as their primary objective (unlike supervised learning …

Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022 - proceedings.neurips.cc
The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

Learning safe control for multi-robot systems: Methods, verification, and open challenges

K Garg, S Zhang, O So, C Dawson, C Fan - Annual Reviews in Control, 2024 - Elsevier
In this survey, we review the recent advances in control design methods for robotic multi-
agent systems (MAS), focusing on learning-based methods with safety considerations. We …

A review: machine learning for combinatorial optimization problems in energy areas

X Yang, Z Wang, H Zhang, N Ma, N Yang, H Liu… - Algorithms, 2022 - mdpi.com
Combinatorial optimization problems (COPs) are a class of NP-hard problems with great
practical significance. Traditional approaches for COPs suffer from high computational time …

A survey on transformers in reinforcement learning

W Li, H Luo, Z Lin, C Zhang, Z Lu, D Ye - arXiv preprint arXiv:2301.03044, 2023 - arxiv.org
Transformer has been considered the dominating neural architecture in NLP and CV, mostly
under supervised settings. Recently, a similar surge of using Transformers has appeared in …

Attacking deep reinforcement learning with decoupled adversarial policy

K Mo, W Tang, J Li, X Yuan - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While Deep Reinforcement Learning (DRL) has achieved outstanding performance in
extensive applications, exploiting its vulnerability with adversarial attacks is essential …

Towards unifying behavioral and response diversity for open-ended learning in zero-sum games

X Liu, H Jia, Y Wen, Y Hu, Y Chen… - Advances in …, 2021 - proceedings.neurips.cc
Measuring and promoting policy diversity is critical for solving games with strong non-
transitive dynamics where strategic cycles exist, and there is no consistent winner (eg, Rock …