Online learning in Markov decision processes with changing cost sequences

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

被引用次数：337 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of reinforcement learning algorithms for dynamically varying environments

S Padakandla - ACM Computing Surveys (CSUR), 2021 - dl.acm.org

Reinforcement learning (RL) algorithms find applications in inventory control, recommender
systems, vehicular traffic management, cloud computing, and robotics. The real-world …

被引用次数：167 相关文章所有 6 个版本

[PDF] neurips.cc

A definition of continual reinforcement learning

D Abel, A Barreto, B Van Roy… - Advances in …, 2024 - proceedings.neurips.cc

In a standard view of the reinforcement learning problem, an agent's goal is to efficiently
identify a policy that maximizes long-term reward. However, this perspective is based on a …

被引用次数：58 相关文章所有 8 个版本

[PDF] mlr.press

Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach

CY Wei, H Luo - Conference on learning theory, 2021 - proceedings.mlr.press

We propose a black-box reduction that turns a certain reinforcement learning algorithm with
optimal regret in a (near-) stationary environment into another algorithm with optimal …

被引用次数：118 相关文章所有 4 个版本

[PDF] arxiv.org

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arXiv preprint arXiv:1705.07798, 2017 - arxiv.org

We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

被引用次数：289 相关文章所有 9 个版本

[PDF] arxiv.org

Reinforcement learning algorithm for non-stationary environments

S Padakandla, P KJ, S Bhatnagar - Applied Intelligence, 2020 - Springer

Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary
environment. However, the stationary assumption on the environment is very restrictive. In …

被引用次数：152 相关文章所有 9 个版本

[PDF] mlr.press

Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism

WC Cheung, D Simchi-Levi… - … conference on machine …, 2020 - proceedings.mlr.press

We consider un-discounted reinforcement learning (RL) in Markov decision processes
(MDPs) under drifting non-stationarity,\ie, both the reward and state transition distributions …

被引用次数：116 相关文章所有 7 个版本

[PDF] mlr.press

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press

We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

被引用次数：61 相关文章所有 6 个版本

[PDF] neurips.cc

Online reinforcement learning in stochastic games

CY Wei, YT Hong, CJ Lu - Advances in Neural Information …, 2017 - proceedings.neurips.cc

We study online reinforcement learning in average-reward stochastic games (SGs). An SG
models a two-player zero-sum game in a Markov environment, where state transitions and …

被引用次数：149 相关文章所有 7 个版本

[PDF] mlr.press

Near-optimal model-free reinforcement learning in non-stationary episodic mdps

W Mao, K Zhang, R Zhu… - … on Machine Learning, 2021 - proceedings.mlr.press

We consider model-free reinforcement learning (RL) in non-stationary Markov decision
processes. Both the reward functions and the state transition functions are allowed to vary …

被引用次数：44 相关文章所有 5 个版本

高级搜索

QQ 群