[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Near-optimal reinforcement learning in factored mdps

I Osband, B Van Roy - Advances in Neural Information …, 2014 - proceedings.neurips.cc
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs)
will suffer $\Omega (\sqrt {SAT}) $ regret on some MDP, where $ T $ is the elapsed time and …

Information-theoretic confidence bounds for reinforcement learning

X Lu, B Van Roy - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We integrate information-theoretic concepts into the design and analysis of optimistic
algorithms and Thompson sampling. By making a connection between information-theoretic …

Learning in congestion games with bandit feedback

Q Cui, Z Xiong, M Fazel, SS Du - Advances in Neural …, 2022 - proceedings.neurips.cc
In this paper, we investigate Nash-regret minimization in congestion games, a class of
games with benign theoretical structure and broad real-world applications. We first propose …

[HTML][HTML] Online reinforcement learning for condition-based group maintenance using factored Markov decision processes

J Xu, B Liu, X Zhao, XL Wang - European Journal of Operational Research, 2024 - Elsevier
We investigate a condition-based group maintenance problem for multi-component systems,
where the degradation process of a specific component is affected only by its neighbouring …

Towards minimax optimal reinforcement learning in factored markov decision processes

Y Tian, J Qian, S Sra - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study minimax optimal reinforcement learning in episodic factored Markov decision
processes (FMDPs), which are MDPs with conditionally independent transition components …

Oracle-efficient regret minimization in factored mdps with unknown structure

A Rosenberg, Y Mansour - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We study regret minimization in non-episodic factored Markov decision processes (FMDPs),
where all existing algorithms make the strong assumption that the factored structure of the …

[PDF][PDF] Reinforcement Learning: Foundations

S Mannor, Y Mansour, A Tamar - Online manuscript, 2022 - rl-tau-2023.wdfiles.com
Concisely defined, Reinforcement Learning, abbreviated as RL, is the discipline of learning
and acting in environments where sequential decisions are made. That is, the decision …

Improved exploration in factored average-reward mdps

MS Talebi, A Jonsson… - … conference on artificial …, 2021 - proceedings.mlr.press
We consider a regret minimization task under the average-reward criterion in an unknown
Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where …

[PDF][PDF] Oracle-efficient reinforcement learning in factored MDPs with unknown structure

A Rosenberg, Y Mansour - arXiv preprint arXiv:2009.05986, 2020 - researchgate.net
We consider provably-efficient reinforcement learning (RL) in non-episodic factored Markov
decision processes (FMDPs). All previous algorithms for regret minimization in this setting …