Harnessing density ratios for online reinforcement learning

文章

学术资源搜索

获得 4 条结果（用时0.01秒）

我的图书馆

Harnessing density ratios for online reinforcement learning

在引用文章中搜索

[PDF] arxiv.org

Online estimation via offline estimation: An information-theoretic framework

DJ Foster, Y Han, J Qian, A Rakhlin - arXiv preprint arXiv:2404.10122, 2024 - arxiv.org

$$ The classical theory of statistical estimation aims to estimate a parameter of interest
under data generated from a fixed design (" offline estimation"), while the contemporary …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Scalable Online Exploration via Coverability

P Amortila, DJ Foster, A Krishnamurthy - arXiv preprint arXiv:2403.06571, 2024 - arxiv.org

Exploration is a major challenge in reinforcement learning, especially for high-dimensional
domains that require function approximation. We propose exploration objectives--policy …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data

Z Jia, A Rakhlin, A Sekhari, CY Wei - arXiv preprint arXiv:2403.17091, 2024 - arxiv.org

We revisit the problem of offline reinforcement learning with value function realizability but
without Bellman completeness. Previous work by Xie and Jiang (2021) and Foster et …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

J Kwon, S Mannor, C Caramanis, Y Efroni - arXiv preprint arXiv …, 2024 - arxiv.org

In many real-world decision problems there is partially observed, hidden or latent
information that remains fixed throughout an interaction. Such decision problems can be …

高级搜索

QQ 群

Harnessing density ratios for online reinforcement learning

Online estimation via offline estimation: An information-theoretic framework

Scalable Online Exploration via Coverability

Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

引用