Efficient model-free exploration in low-rank mdps

Z Mhammedi, A Block, DJ Foster… - Advances in Neural …, 2024 - proceedings.neurips.cc
A major challenge in reinforcement learning is to develop practical, sample-efficient
algorithms for exploration in high-dimensional domains where generalization and function …

When is agnostic reinforcement learning statistically tractable?

Z Jia, G Li, A Rakhlin, A Sekhari… - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arXiv preprint arXiv:2205.13589, 2022 - arxiv.org
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

Timing as an Action: Learning When to Observe and Act

H Zhou, A Huang, K Azizzadenesheli… - International …, 2024 - proceedings.mlr.press
In standard reinforcement learning setups, the agent receives observations and performs
actions at evenly spaced intervals. However, in many real-world settings, observations are …

Harnessing density ratios for online reinforcement learning

P Amortila, DJ Foster, N Jiang, A Sekhari… - arXiv preprint arXiv …, 2024 - arxiv.org
The theories of offline and online reinforcement learning, despite having evolved in parallel,
have begun to show signs of the possibility for a unification, with algorithms and analysis …

Provably efficient cvar rl in low-rank mdps

Y Zhao, W Zhan, X Hu, H Leung, F Farnia… - arXiv preprint arXiv …, 2023 - arxiv.org
We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the
Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau $. Prior theoretical work …

On the role of general function approximation in offline reinforcement learning

C Mao, Q Zhang, Z Wang, X Li - The Twelfth International …, 2024 - openreview.net
We study offline reinforcement learning (RL) with general function approximation. General
function approximation is a powerful tool for algorithm design and analysis, but its …

Model-Free Robust -Divergence Reinforcement Learning Using Both Offline and Online Data

K Panaganti, A Wierman, E Mazumdar - arXiv preprint arXiv:2405.05468, 2024 - arxiv.org
The robust $\phi $-regularized Markov Decision Process (RRMDP) framework focuses on
designing control policies that are robust against parameter uncertainties due to mismatches …

Scalable Online Exploration via Coverability

P Amortila, DJ Foster, A Krishnamurthy - arXiv preprint arXiv:2403.06571, 2024 - arxiv.org
Exploration is a major challenge in reinforcement learning, especially for high-dimensional
domains that require function approximation. We propose exploration objectives--policy …

Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data

Z Jia, A Rakhlin, A Sekhari, CY Wei - arXiv preprint arXiv:2403.17091, 2024 - arxiv.org
We revisit the problem of offline reinforcement learning with value function realizability but
without Bellman completeness. Previous work by Xie and Jiang (2021) and Foster et …