Optidice: Offline policy optimization via stationary distribution correction estimation

M Zare, PM Kebria, A Khosravi… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In recent years, the development of robotics and artificial intelligence (AI) systems has been
nothing short of remarkable. As these systems continue to evolve, they are being utilized in …

被引用次数：62 相关文章所有 2 个版本

[PDF] mlr.press

Offline reinforcement learning with realizability and single-policy concentrability

W Zhan, B Huang, A Huang… - … on Learning Theory, 2022 - proceedings.mlr.press

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …

被引用次数：128 相关文章所有 6 个版本

[PDF] neurips.cc

The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

被引用次数：38 相关文章所有 10 个版本

[PDF] neurips.cc

Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets

ZW Hong, A Kumar, S Karnik… - Advances in …, 2023 - proceedings.neurips.cc

Offline reinforcement learning (RL) enables learning a decision-making policy without
interaction with the environment. This makes it particularly beneficial in situations where …

被引用次数：14 相关文章所有 6 个版本

[PDF] neurips.cc

VOCE: Variational optimization with conservative estimation for offline safe reinforcement learning

J Guan, G Chen, J Ji, L Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Offline safe reinforcement learning (RL) algorithms promise to learn policies that satisfy
safety constraints directly in offline datasets without interacting with the environment. This …

被引用次数：11 相关文章所有 4 个版本

[PDF] neurips.cc

Design from policies: Conservative test-time adaptation for offline policy optimization

J Liu, H Zhang, Z Zhuang, Y Kang… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …

被引用次数：10 相关文章所有 5 个版本

[PDF] mlr.press

Reinforcement learning in low-rank mdps with density features

A Huang, J Chen, N Jiang - International Conference on …, 2023 - proceedings.mlr.press

MDPs with low-rank transitions—that is, the transition matrix can be factored into the product
of two matrices, left and right—is a highly representative structure that enables tractable …

被引用次数：19 相关文章所有 8 个版本

[PDF] mlr.press

Revisiting the linear-programming framework for offline rl with general function approximation

AE Ozdaglar, S Pattathil, J Zhang… - … on Machine Learning, 2023 - proceedings.mlr.press

Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-
making using a pre-collected dataset, without further interaction with the environment …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

Optimal conservative offline rl with general function approximation via augmented lagrangian

P Rashidinejad, H Zhu, K Yang, S Russell… - arXiv preprint arXiv …, 2022 - arxiv.org

Offline reinforcement learning (RL), which refers to decision-making from a previously-
collected dataset of interactions, has received significant attention over the past years. Much …

被引用次数：39 相关文章所有 8 个版本

[PDF] neurips.cc

Offline Goal-Conditioned Reinforcement Learning via -Advantage Regression

JY Ma, J Yan, D Jayaraman… - Advances in neural …, 2022 - proceedings.neurips.cc

Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill
learning in the form of reaching diverse goals from purely offline datasets. We propose …

被引用次数：29 相关文章所有 6 个版本

高级搜索

QQ 群