Byol-explore: Exploration by bootstrapped prediction

Z Guo, S Thakoor, M Pîslar… - Advances in neural …, 2022 - proceedings.neurips.cc
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven
exploration in visually complex environments. BYOL-Explore learns the world …

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org
Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press
We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

Optimistic active exploration of dynamical systems

L Treven, C Sancaktar, S Blaes… - Advances in Neural …, 2023 - proceedings.neurips.cc
Reinforcement learning algorithms commonly seek to optimize policies for solving one
particular task. How should we explore an unknown dynamical system such that the …

Cem: Constrained entropy maximization for task-agnostic safe exploration

Q Yang, MTJ Spaan - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
In the absence of assigned tasks, a learning agent typically seeks to explore its environment
efficiently. However, the pursuit of exploration will bring more safety risks. An under-explored …

Submodular reinforcement learning

M Prajapat, M Mutný, MN Zeilinger… - arXiv preprint arXiv …, 2023 - arxiv.org
In reinforcement learning (RL), rewards of states are typically considered additive, and
following the Markov assumption, they are $\textit {independent} $ of states visited …

Active coverage for pac reinforcement learning

A Al-Marjani, A Tirinzoni… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Collecting and leveraging data with good coverage properties plays a crucial role in different
aspects of reinforcement learning (RL), including reward-free exploration and offline …

Global reinforcement learning: Beyond linear and convex rewards via submodular semi-gradient methods

R De Santi, M Prajapat, A Krause - arXiv preprint arXiv:2407.09905, 2024 - arxiv.org
In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the
visited states, eg, a value function. Unfortunately, objectives of this type cannot model many …

Nearly optimal latent state decoding in block mdps

Y Jedra, J Lee, A Proutiere… - … Conference on Artificial …, 2023 - proceedings.mlr.press
We consider the problem of model estimation in episodic Block MDPs. In these MDPs, the
decision maker has access to rich observations or contexts generated from a small number …

Provably efficient causal model-based reinforcement learning for systematic generalization

M Mutti, R De Santi, E Rossi, JF Calderon… - Proceedings of the …, 2023 - ojs.aaai.org
In the sequential decision making setting, an agent aims to achieve systematic
generalization over a large, possibly infinite, set of environments. Such environments are …