Planning with expectation models

HP Van Hasselt, M Hessel… - Advances in Neural …, 2019 - proceedings.neurips.cc

We examine the question of when and how parametric models are most useful in
reinforcement learning. In particular, we look at commonalities and differences between …

被引用次数：236 相关文章所有 8 个版本

[PDF] mlr.press

Average-reward off-policy policy evaluation with function approximation

S Zhang, Y Wan, RS Sutton… - … conference on machine …, 2021 - proceedings.mlr.press

We consider off-policy policy evaluation with function approximation (FA) in average-reward
MDPs, where the goal is to estimate both the reward rate and the differential value function …

被引用次数：39 相关文章所有 8 个版本

[HTML] sciencedirect.com

[HTML][HTML] Reward-respecting subtasks for model-based reinforcement learning

RS Sutton, MC Machado, GZ Holland, D Szepesvari… - Artificial Intelligence, 2023 - Elsevier

To achieve the ambitious goals of artificial intelligence, reinforcement learning must include
planning with a model of the world that is abstract in state and time. Deep learning has made …

被引用次数：24 相关文章所有 9 个版本

[PDF] neurips.cc

Forethought and hindsight in credit assignment

V Chelu, D Precup… - Advances in Neural …, 2020 - proceedings.neurips.cc

We address the problem of credit assignment in reinforcement learning and explore
fundamental questions regarding the way in which an agent can best use additional …

被引用次数：35 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Investigating the properties of neural network representations in reinforcement learning

H Wang, E Miahi, M White, MC Machado, Z Abbas… - Artificial Intelligence, 2024 - Elsevier

In this paper we investigate the properties of representations learned by deep reinforcement
learning systems. Much of the early work on representations for reinforcement learning …

被引用次数：28 相关文章所有 4 个版本

[PDF] neurips.cc

Novelty search in representational space for sample efficient exploration

RY Tao, V François-Lavet… - Advances in Neural …, 2020 - proceedings.neurips.cc

We present a new approach for efficient exploration which leverages a low-dimensional
encoding of the environment learned with a combination of model-based and model-free …

被引用次数：50 相关文章所有 7 个版本

[PDF] mlr.press

Towards evaluating adaptivity of model-based reinforcement learning methods

Y Wan, A Rahimi-Kalahroudi… - International …, 2022 - proceedings.mlr.press

In recent years, a growing number of deep model-based reinforcement learning (RL)
methods have been introduced. The interest in deep model-based RL is not surprising …

被引用次数：14 相关文章所有 6 个版本

[PDF] hw.ac.uk

Planning, execution, and adaptation for multi-robot systems using probabilistic and temporal planning

Y Carreno, JHA Ng, Y Petillot… - … Agents and Multiagent …, 2022 - researchportal.hw.ac.uk

Planning for multi-robot coordination during long horizon missions in complex environments
need to consider resources, temporal constraints, and uncertainty. This could be …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Off-policy maximum entropy reinforcement learning: Soft actor-critic with advantage weighted mixture policy (SAC-AWMP)

Z Hou, K Zhang, Y Wan, D Li, C Fu, H Yu - arXiv preprint arXiv:2002.02829, 2020 - arxiv.org

The optimal policy of a reinforcement learning problem is often discontinuous and non-
smooth. Ie, for two states with similar representations, their optimal policies can be …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Bounding-box inference for error-aware model-based reinforcement learning

EJ Talvitie, Z Shao, H Li, J Hu, J Boerma… - arXiv preprint arXiv …, 2024 - arxiv.org

In model-based reinforcement learning, simulated experiences from the learned model are
often treated as equivalent to experience from the real environment. However, when the …

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群