Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Exploit reward shifting in value-based deep-rl: Optimistic curiosity-based exploration and conservative exploitation via linear reward shaping

H Sun, L Han, R Yang, X Ma… - Advances in neural …, 2022 - proceedings.neurips.cc
In this work, we study the simple yet universally applicable case of reward shaping in value-
based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a …

Anti-exploration by random network distillation

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press
Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples

H Sun, A Hüyük, D Jarrett… - Advances in Neural …, 2024 - proceedings.neurips.cc
Learning controllers with offline data in decision-making systems is an essential area of
research due to its potential to reduce the risk of applications in real-world systems …

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

What is essential for unseen goal generalization of offline goal-conditioned rl?

R Yang, L Yong, X Ma, H Hu… - … on Machine Learning, 2023 - proceedings.mlr.press
Offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully
offline datasets. In addition to being conservative within the dataset, the generalization …

Survival instinct in offline reinforcement learning

A Li, D Misra, A Kolobov… - Advances in neural …, 2024 - proceedings.neurips.cc
We present a novel observation about the behavior of offline reinforcement learning (RL)
algorithms: on many benchmark datasets, offline RL can produce well-performing and safe …

Query-dependent prompt evaluation and optimization with offline inverse RL

H Sun, A Hüyük, M van der Schaar - The Twelfth International …, 2023 - openreview.net
In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models
(LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective …

Behavior proximal policy optimization

Z Zhuang, K Lei, J Liu, D Wang, Y Guo - arXiv preprint arXiv:2302.11312, 2023 - arxiv.org
Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-
critic methods perform poorly due to the overestimation of out-of-distribution state-action …

What is flagged in uncertainty quantification? latent density models for uncertainty categorization

H Sun, B van Breugel, J Crabbé… - Advances in …, 2023 - proceedings.neurips.cc
Uncertainty quantification (UQ) is essential for creating trustworthy machine learning
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …