- 学术资源搜索

Efficient model-free exploration in low-rank mdps

Z Mhammedi, A Block, DJ Foster… - Advances in Neural …, 2024 - proceedings.neurips.cc

A major challenge in reinforcement learning is to develop practical, sample-efficient
algorithms for exploration in high-dimensional domains where generalization and function …

被引用次数：16 相关文章所有 5 个版本

[PDF] neurips.cc

When is agnostic reinforcement learning statistically tractable?

Z Jia, G Li, A Rakhlin, A Sekhari… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arXiv preprint arXiv:2205.13589, 2022 - arxiv.org

We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

被引用次数：30 相关文章所有 5 个版本

[PDF] mlr.press

Timing as an Action: Learning When to Observe and Act

H Zhou, A Huang, K Azizzadenesheli… - International …, 2024 - proceedings.mlr.press

In standard reinforcement learning setups, the agent receives observations and performs
actions at evenly spaced intervals. However, in many real-world settings, observations are …

被引用次数：1 相关文章

[PDF] arxiv.org

Harnessing density ratios for online reinforcement learning

P Amortila, DJ Foster, N Jiang, A Sekhari… - arXiv preprint arXiv …, 2024 - arxiv.org

The theories of offline and online reinforcement learning, despite having evolved in parallel,
have begun to show signs of the possibility for a unification, with algorithms and analysis …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Provably efficient cvar rl in low-rank mdps

Y Zhao, W Zhan, X Hu, H Leung, F Farnia… - arXiv preprint arXiv …, 2023 - arxiv.org

We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the
Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau $. Prior theoretical work …

被引用次数：3 相关文章所有 5 个版本

[PDF] openreview.net

On the role of general function approximation in offline reinforcement learning

C Mao, Q Zhang, Z Wang, X Li - The Twelfth International …, 2024 - openreview.net

We study offline reinforcement learning (RL) with general function approximation. General
function approximation is a powerful tool for algorithm design and analysis, but its …

被引用次数：2 相关文章

[PDF] arxiv.org

Model-Free Robust -Divergence Reinforcement Learning Using Both Offline and Online Data

K Panaganti, A Wierman, E Mazumdar - arXiv preprint arXiv:2405.05468, 2024 - arxiv.org

The robust $\phi $-regularized Markov Decision Process (RRMDP) framework focuses on
designing control policies that are robust against parameter uncertainties due to mismatches …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Scalable Online Exploration via Coverability

P Amortila, DJ Foster, A Krishnamurthy - arXiv preprint arXiv:2403.06571, 2024 - arxiv.org

Exploration is a major challenge in reinforcement learning, especially for high-dimensional
domains that require function approximation. We propose exploration objectives--policy …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data

Z Jia, A Rakhlin, A Sekhari, CY Wei - arXiv preprint arXiv:2403.17091, 2024 - arxiv.org

We revisit the problem of offline reinforcement learning with value function realizability but
without Bellman completeness. Previous work by Xie and Jiang (2021) and Foster et …

被引用次数：2 相关文章所有 3 个版本

高级搜索

QQ 群

Efficient model-free exploration in low-rank mdps

When is agnostic reinforcement learning statistically tractable?

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

Timing as an Action: Learning When to Observe and Act

Harnessing density ratios for online reinforcement learning

Provably efficient cvar rl in low-rank mdps

On the role of general function approximation in offline reinforcement learning

Model-Free Robust -Divergence Reinforcement Learning Using Both Offline and Online Data

Scalable Online Exploration via Coverability

Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data

引用