I Osband, B Van Roy - Advances in Neural Information …, 2014 - proceedings.neurips.cc
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega (\sqrt {SAT}) $ regret on some MDP, where $ T $ is the elapsed time and …
X Lu, B Van Roy - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling. By making a connection between information-theoretic …
In this paper, we investigate Nash-regret minimization in congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose …
J Xu, B Liu, X Zhao, XL Wang - European Journal of Operational Research, 2024 - Elsevier
We investigate a condition-based group maintenance problem for multi-component systems, where the degradation process of a specific component is affected only by its neighbouring …
Y Tian, J Qian, S Sra - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components …
A Rosenberg, Y Mansour - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the …
Concisely defined, Reinforcement Learning, abbreviated as RL, is the discipline of learning and acting in environments where sequential decisions are made. That is, the decision …
MS Talebi, A Jonsson… - … conference on artificial …, 2021 - proceedings.mlr.press
We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where …
A Rosenberg, Y Mansour - arXiv preprint arXiv:2009.05986, 2020 - researchgate.net
We consider provably-efficient reinforcement learning (RL) in non-episodic factored Markov decision processes (FMDPs). All previous algorithms for regret minimization in this setting …