Active exploration in markov decision processes

Z Guo, S Thakoor, M Pîslar… - Advances in neural …, 2022 - proceedings.neurips.cc

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven
exploration in visually complex environments. BYOL-Explore learns the world …

被引用次数：72 相关文章所有 5 个版本

[PDF] mlr.press

Reward-free exploration for reinforcement learning

C Jin, A Krishnamurthy… - … on Machine Learning, 2020 - proceedings.mlr.press

Exploration is widely regarded as one of the most challenging aspects of reinforcement
learning (RL), with many naive approaches succumbing to exponential sample complexity …

被引用次数：264 相关文章所有 6 个版本

[PDF] jmlr.org

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org

Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

被引用次数：14 相关文章所有 5 个版本

[PDF] neurips.cc

Maximum state entropy exploration using predecessor and successor representations

AK Jain, L Lehnert, I Rish… - Advances in Neural …, 2024 - proceedings.neurips.cc

Animals have a developed ability to explore that aids them in important tasks such as
locating food, exploring for shelter, and finding misplaced items. These exploration skills …

被引用次数：9 相关文章所有 7 个版本

[PDF] aaai.org

Cem: Constrained entropy maximization for task-agnostic safe exploration

Q Yang, MTJ Spaan - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

In the absence of assigned tasks, a learning agent typically seeks to explore its environment
efficiently. However, the pursuit of exploration will bring more safety risks. An under-explored …

被引用次数：16 相关文章所有 7 个版本

[PDF] neurips.cc

Constrained episodic reinforcement learning in concave-convex and knapsack settings

K Brantley, M Dudik, T Lykouris… - Advances in …, 2020 - proceedings.neurips.cc

We propose an algorithm for tabular episodic reinforcement learning with constraints. We
provide a modular analysis with strong theoretical guarantees for settings with concave …

被引用次数：61 相关文章所有 12 个版本

[PDF] arxiv.org

Submodular reinforcement learning

M Prajapat, M Mutný, MN Zeilinger… - arXiv preprint arXiv …, 2023 - arxiv.org

In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …

被引用次数：12 相关文章所有 5 个版本

[PDF] mlr.press

Regret guarantees for model-based reinforcement learning with long-term average constraints

M Agarwal, Q Bai, V Aggarwal - Uncertainty in Artificial …, 2022 - proceedings.mlr.press

We consider the problem of constrained Markov Decision Process (CMDP) where an agent
interacts with an ergodic Markov Decision Process. At every interaction, the agent obtains a …

被引用次数：18 相关文章所有 3 个版本

[PDF] mlr.press

Active exploration via experiment design in markov chains

M Mutny, T Janik, A Krause - International Conference on …, 2023 - proceedings.mlr.press

A key challenge in science and engineering is to design experiments to learn about some
unknown quantity of interest. Classical experimental design optimally allocates the …

被引用次数：18 相关文章所有 4 个版本

[PDF] mlr.press

The importance of non-markovianity in maximum state entropy exploration

M Mutti, R De Santi, M Restelli - International Conference on …, 2022 - proceedings.mlr.press

In the maximum state entropy exploration framework, an agent interacts with a reward-free
environment to learn a policy that maximizes the entropy of the expected state visitations it is …

被引用次数：30 相关文章所有 7 个版本

高级搜索

QQ 群