An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

A survey of reinforcement learning algorithms for dynamically varying environments

S Padakandla - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Reinforcement learning (RL) algorithms find applications in inventory control, recommender
systems, vehicular traffic management, cloud computing, and robotics. The real-world …

A definition of continual reinforcement learning

D Abel, A Barreto, B Van Roy… - Advances in …, 2024 - proceedings.neurips.cc
In a standard view of the reinforcement learning problem, an agent's goal is to efficiently
identify a policy that maximizes long-term reward. However, this perspective is based on a …

Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach

CY Wei, H Luo - Conference on learning theory, 2021 - proceedings.mlr.press
We propose a black-box reduction that turns a certain reinforcement learning algorithm with
optimal regret in a (near-) stationary environment into another algorithm with optimal …

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arXiv preprint arXiv:1705.07798, 2017 - arxiv.org
We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

Reinforcement learning algorithm for non-stationary environments

S Padakandla, P KJ, S Bhatnagar - Applied Intelligence, 2020 - Springer
Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary
environment. However, the stationary assumption on the environment is very restrictive. In …

Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism

WC Cheung, D Simchi-Levi… - … conference on machine …, 2020 - proceedings.mlr.press
We consider un-discounted reinforcement learning (RL) in Markov decision processes
(MDPs) under drifting non-stationarity,\ie, both the reward and state transition distributions …

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press
We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

Online reinforcement learning in stochastic games

CY Wei, YT Hong, CJ Lu - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We study online reinforcement learning in average-reward stochastic games (SGs). An SG
models a two-player zero-sum game in a Markov environment, where state transitions and …

Near-optimal model-free reinforcement learning in non-stationary episodic mdps

W Mao, K Zhang, R Zhu… - … on Machine Learning, 2021 - proceedings.mlr.press
We consider model-free reinforcement learning (RL) in non-stationary Markov decision
processes. Both the reward functions and the state transition functions are allowed to vary …