Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

A theoretical analysis of deep Q-learning

J Fan, Z Wang, Y Xie, Z Yang - Learning for dynamics and …, 2020 - proceedings.mlr.press
Despite the great empirical success of deep reinforcement learning, its theoretical
foundation is less well understood. In this work, we make the first attempt to theoretically …

Multi-agent reinforcement learning in sequential social dilemmas

JZ Leibo, V Zambaldi, M Lanctot, J Marecki… - arXiv preprint arXiv …, 2017 - arxiv.org
Matrix games like Prisoner's Dilemma have guided research on social dilemmas for
decades. However, they necessarily treat the choice to cooperate or defect as an atomic …

A GRASP× ILS for the vehicle routing problem with time windows, synchronization and precedence constraints

SRA Haddadene, N Labadie, C Prodhon - Expert Systems with Applications, 2016 - Elsevier
Efficient use of resources while ensuring quality services points the attention of Home Health
Care structures (HHC). HHC structures propose keeping at home patients who do not …

Finite-sample analysis for sarsa with linear function approximation

S Zou, T Xu, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc
SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …

Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium

Q Xie, Y Chen, Z Wang, Z Yang - Conference on learning …, 2020 - proceedings.mlr.press
In this work, we develop provably efficient reinforcement learning algorithms for two-player
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …

Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games

K Zhang, Z Yang, T Basar - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We study the global convergence of policy optimization for finding the Nash equilibria (NE)
in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of …

Online reinforcement learning in stochastic games

CY Wei, YT Hong, CJ Lu - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We study online reinforcement learning in average-reward stochastic games (SGs). An SG
models a two-player zero-sum game in a Markov environment, where state transitions and …

Towards general function approximation in zero-sum markov games

B Huang, JD Lee, Z Wang, Z Yang - arXiv preprint arXiv:2107.14702, 2021 - arxiv.org
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous
moves. The study focuses on the challenging settings where the value function or the model …

Online minimax Q network learning for two-player zero-sum Markov games

Y Zhu, D Zhao - IEEE Transactions on Neural Networks and …, 2020 - ieeexplore.ieee.org
The Nash equilibrium is an important concept in game theory. It describes the least
exploitability of one player from any opponents. We combine game theory, dynamic …