Rorl: Robust offline reinforcement learning via conservative smoothing

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

被引用次数：15 相关文章所有 7 个版本

[PDF] neurips.cc

Exploit reward shifting in value-based deep-rl: Optimistic curiosity-based exploration and conservative exploitation via linear reward shaping

H Sun, L Han, R Yang, X Ma… - Advances in neural …, 2022 - proceedings.neurips.cc

In this work, we study the simple yet universally applicable case of reward shaping in value-
based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a …

被引用次数：21 相关文章所有 3 个版本

[PDF] mlr.press

Anti-exploration by random network distillation

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press

Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

被引用次数：24 相关文章所有 6 个版本

[PDF] neurips.cc

Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples

H Sun, A Hüyük, D Jarrett… - Advances in Neural …, 2024 - proceedings.neurips.cc

Learning controllers with offline data in decision-making systems is an essential area of
research due to its potential to reduce the risk of applications in real-world systems …

被引用次数：7 相关文章所有 6 个版本

[PDF] neurips.cc

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

被引用次数：21 相关文章所有 6 个版本

[PDF] mlr.press

What is essential for unseen goal generalization of offline goal-conditioned rl?

R Yang, L Yong, X Ma, H Hu… - … on Machine Learning, 2023 - proceedings.mlr.press

Offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully
offline datasets. In addition to being conservative within the dataset, the generalization …

被引用次数：18 相关文章所有 7 个版本

[PDF] neurips.cc

Survival instinct in offline reinforcement learning

A Li, D Misra, A Kolobov… - Advances in neural …, 2024 - proceedings.neurips.cc

We present a novel observation about the behavior of offline reinforcement learning (RL)
algorithms: on many benchmark datasets, offline RL can produce well-performing and safe …

被引用次数：15 相关文章所有 5 个版本

[PDF] openreview.net

Query-dependent prompt evaluation and optimization with offline inverse RL

H Sun, A Hüyük, M van der Schaar - The Twelfth International …, 2023 - openreview.net

In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models
(LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Behavior proximal policy optimization

Z Zhuang, K Lei, J Liu, D Wang, Y Guo - arXiv preprint arXiv:2302.11312, 2023 - arxiv.org

Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-
critic methods perform poorly due to the overestimation of out-of-distribution state-action …

被引用次数：34 相关文章所有 3 个版本

[PDF] neurips.cc

What is flagged in uncertainty quantification? latent density models for uncertainty categorization

H Sun, B van Breugel, J Crabbé… - Advances in …, 2023 - proceedings.neurips.cc

Uncertainty quantification (UQ) is essential for creating trustworthy machine learning
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …

被引用次数：5 相关文章所有 3 个版本

高级搜索

QQ 群