Active model estimation in markov decision processes

Z Guo, S Thakoor, M Pîslar… - Advances in neural …, 2022 - proceedings.neurips.cc

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven
exploration in visually complex environments. BYOL-Explore learns the world …

被引用次数：72 相关文章所有 5 个版本

[PDF] jmlr.org

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org

Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

被引用次数：14 相关文章所有 5 个版本

[PDF] mlr.press

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press

We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

被引用次数：15 相关文章所有 9 个版本

[PDF] neurips.cc

Optimistic active exploration of dynamical systems

L Treven, C Sancaktar, S Blaes… - Advances in Neural …, 2023 - proceedings.neurips.cc

Reinforcement learning algorithms commonly seek to optimize policies for solving one
particular task. How should we explore an unknown dynamical system such that the …

被引用次数：15 相关文章所有 6 个版本

[PDF] aaai.org

Cem: Constrained entropy maximization for task-agnostic safe exploration

Q Yang, MTJ Spaan - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

In the absence of assigned tasks, a learning agent typically seeks to explore its environment
efficiently. However, the pursuit of exploration will bring more safety risks. An under-explored …

被引用次数：16 相关文章所有 7 个版本

[PDF] arxiv.org

Submodular reinforcement learning

M Prajapat, M Mutný, MN Zeilinger… - arXiv preprint arXiv …, 2023 - arxiv.org

In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …

被引用次数：12 相关文章所有 5 个版本

[PDF] mlr.press

Active coverage for pac reinforcement learning

A Al-Marjani, A Tirinzoni… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Collecting and leveraging data with good coverage properties plays a crucial role in different
aspects of reinforcement learning (RL), including reward-free exploration and offline …

被引用次数：8 相关文章所有 8 个版本

[PDF] arxiv.org

Global reinforcement learning: Beyond linear and convex rewards via submodular semi-gradient methods

R De Santi, M Prajapat, A Krause - arXiv preprint arXiv:2407.09905, 2024 - arxiv.org

In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …

被引用次数：3 相关文章

[PDF] mlr.press

Nearly optimal latent state decoding in block mdps

Y Jedra, J Lee, A Proutiere… - … Conference on Artificial …, 2023 - proceedings.mlr.press

We consider the problem of model estimation in episodic Block MDPs. In these MDPs, the
decision maker has access to rich observations or contexts generated from a small number …

被引用次数：12 相关文章所有 6 个版本

[PDF] aaai.org

Provably efficient causal model-based reinforcement learning for systematic generalization

M Mutti, R De Santi, E Rossi, JF Calderon… - Proceedings of the …, 2023 - ojs.aaai.org

In the sequential decision making setting, an agent aims to achieve systematic
generalization over a large, possibly infinite, set of environments. Such environments are …

被引用次数：16 相关文章所有 8 个版本

高级搜索

QQ 群