In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a …
CY Wei, H Luo - Conference on learning theory, 2021 - proceedings.mlr.press
We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-) stationary environment into another algorithm with optimal …
We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the …
Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In …
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity,\ie, both the reward and state transition distributions …
CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press
We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …
CY Wei, YT Hong, CJ Lu - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and …
W Mao, K Zhang, R Zhu… - … on Machine Learning, 2021 - proceedings.mlr.press
We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary …