Value function approximation in zero-sum markov games

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

被引用次数：1700 相关文章所有 8 个版本

[PDF] mlr.press

A theoretical analysis of deep Q-learning

J Fan, Z Wang, Y Xie, Z Yang - Learning for dynamics and …, 2020 - proceedings.mlr.press

Despite the great empirical success of deep reinforcement learning, its theoretical
foundation is less well understood. In this work, we make the first attempt to theoretically …

被引用次数：857 相关文章所有 9 个版本

[PDF] arxiv.org

Multi-agent reinforcement learning in sequential social dilemmas

JZ Leibo, V Zambaldi, M Lanctot, J Marecki… - arXiv preprint arXiv …, 2017 - arxiv.org

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for
decades. However, they necessarily treat the choice to cooperate or defect as an atomic …

被引用次数：957 相关文章所有 14 个版本

A GRASP× ILS for the vehicle routing problem with time windows, synchronization and precedence constraints

SRA Haddadene, N Labadie, C Prodhon - Expert Systems with Applications, 2016 - Elsevier

Efficient use of resources while ensuring quality services points the attention of Home Health
Care structures (HHC). HHC structures propose keeping at home patients who do not …

被引用次数：120 相关文章所有 10 个版本

[PDF] neurips.cc

Finite-sample analysis for sarsa with linear function approximation

S Zou, T Xu, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc

SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …

被引用次数：207 相关文章所有 9 个版本

[PDF] mlr.press

Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium

Q Xie, Y Chen, Z Wang, Z Yang - Conference on learning …, 2020 - proceedings.mlr.press

In this work, we develop provably efficient reinforcement learning algorithms for two-player
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …

被引用次数：161 相关文章所有 6 个版本

[PDF] neurips.cc

Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games

K Zhang, Z Yang, T Basar - Advances in Neural Information …, 2019 - proceedings.neurips.cc

We study the global convergence of policy optimization for finding the Nash equilibria (NE)
in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of …

被引用次数：147 相关文章所有 9 个版本

[PDF] neurips.cc

Online reinforcement learning in stochastic games

CY Wei, YT Hong, CJ Lu - Advances in Neural Information …, 2017 - proceedings.neurips.cc

We study online reinforcement learning in average-reward stochastic games (SGs). An SG
models a two-player zero-sum game in a Markov environment, where state transitions and …

被引用次数：150 相关文章所有 7 个版本

[PDF] arxiv.org

Towards general function approximation in zero-sum markov games

B Huang, JD Lee, Z Wang, Z Yang - arXiv preprint arXiv:2107.14702, 2021 - arxiv.org

This paper considers two-player zero-sum finite-horizon Markov games with simultaneous
moves. The study focuses on the challenging settings where the value function or the model …

被引用次数：59 相关文章所有 5 个版本

Online minimax Q network learning for two-player zero-sum Markov games

Y Zhu, D Zhao - IEEE Transactions on Neural Networks and …, 2020 - ieeexplore.ieee.org

The Nash equilibrium is an important concept in game theory. It describes the least
exploitability of one player from any opponents. We combine game theory, dynamic …

被引用次数：86 相关文章所有 3 个版本

高级搜索

QQ 群