Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3229 相关文章所有 9 个版本

[PDF] neurips.cc

Near-optimal reinforcement learning in factored mdps

I Osband, B Van Roy - Advances in Neural Information …, 2014 - proceedings.neurips.cc

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs)
will suffer $\Omega (\sqrt {SAT}) $ regret on some MDP, where $ T $ is the elapsed time and …

被引用次数：126 相关文章所有 8 个版本

[PDF] neurips.cc

Information-theoretic confidence bounds for reinforcement learning

X Lu, B Van Roy - Advances in Neural Information …, 2019 - proceedings.neurips.cc

We integrate information-theoretic concepts into the design and analysis of optimistic
algorithms and Thompson sampling. By making a connection between information-theoretic …

被引用次数：63 相关文章所有 8 个版本

[PDF] neurips.cc

Learning in congestion games with bandit feedback

Q Cui, Z Xiong, M Fazel, SS Du - Advances in Neural …, 2022 - proceedings.neurips.cc

In this paper, we investigate Nash-regret minimization in congestion games, a class of
games with benign theoretical structure and broad real-world applications. We first propose …

被引用次数：15 相关文章所有 10 个版本

[HTML] sciencedirect.com

[HTML][HTML] Online reinforcement learning for condition-based group maintenance using factored Markov decision processes

J Xu, B Liu, X Zhao, XL Wang - European Journal of Operational Research, 2024 - Elsevier

We investigate a condition-based group maintenance problem for multi-component systems,
where the degradation process of a specific component is affected only by its neighbouring …

被引用次数：6 相关文章所有 7 个版本

[PDF] neurips.cc

Towards minimax optimal reinforcement learning in factored markov decision processes

Y Tian, J Qian, S Sra - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study minimax optimal reinforcement learning in episodic factored Markov decision
processes (FMDPs), which are MDPs with conditionally independent transition components …

被引用次数：28 相关文章所有 7 个版本

[PDF] neurips.cc

Oracle-efficient regret minimization in factored mdps with unknown structure

A Rosenberg, Y Mansour - Advances in Neural Information …, 2021 - proceedings.neurips.cc

We study regret minimization in non-episodic factored Markov decision processes (FMDPs),
where all existing algorithms make the strong assumption that the factored structure of the …

被引用次数：10 相关文章所有 7 个版本

[PDF] wdfiles.com

[PDF][PDF] Reinforcement Learning: Foundations

S Mannor, Y Mansour, A Tamar - Online manuscript, 2022 - rl-tau-2023.wdfiles.com

Concisely defined, Reinforcement Learning, abbreviated as RL, is the discipline of learning
and acting in environments where sequential decisions are made. That is, the decision …

被引用次数：11 相关文章所有 2 个版本

[PDF] mlr.press

Improved exploration in factored average-reward mdps

MS Talebi, A Jonsson… - … conference on artificial …, 2021 - proceedings.mlr.press

We consider a regret minimization task under the average-reward criterion in an unknown
Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where …

被引用次数：8 相关文章所有 6 个版本

[PDF] researchgate.net

[PDF][PDF] Oracle-efficient reinforcement learning in factored MDPs with unknown structure

A Rosenberg, Y Mansour - arXiv preprint arXiv:2009.05986, 2020 - researchgate.net

We consider provably-efficient reinforcement learning (RL) in non-episodic factored Markov
decision processes (FMDPs). All previous algorithms for regret minimization in this setting …

被引用次数：8 相关文章所有 4 个版本

高级搜索

QQ 群