Understanding, predicting and better resolving Q-value divergence in offline-RL

M Gallici, M Fellows, B Ellis, B Pou, I Masmitja… - arXiv preprint arXiv …, 2024 - arxiv.org

Q-learning played a foundational role in the field reinforcement learning (RL). However, TD
algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

H Wang, Y Yue, R Lu, J Shi, A Zhao, S Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated great potential as generalist assistants,
showcasing powerful task understanding and problem-solving capabilities. To deploy LLMs …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

K Yan, AG Schwing, YX Wang - arXiv preprint arXiv:2410.24108, 2024 - arxiv.org

Decision Transformers have recently emerged as a new and compelling paradigm for offline
Reinforcement Learning (RL), completing a trajectory in an autoregressive way. While …

Offline-to-online Reinforcement Learning for Image-based Grasping with Scarce Demonstrations

B Chan, A Leung, J Bergstra - arXiv preprint arXiv:2410.14957, 2024 - arxiv.org

Offline-to-online reinforcement learning (O2O RL) aims to obtain a continually improving
policy as it interacts with the environment, while ensuring the initial behaviour is satisficing …

Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains

S Nishimori, XQ Cai, J Ackermann… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we investigate an offline reinforcement learning (RL) problem where datasets
are collected from two domains. In this scenario, having datasets with domain labels …

Value-Aided Conditional Supervised Learning for Offline RL

J Kim, S Lee, W Kim, Y Sung - arXiv preprint arXiv:2402.02017, 2024 - arxiv.org

Offline reinforcement learning (RL) has seen notable advancements through return-
conditioned supervised learning (RCSL) and value-based methods, yet each approach …

被引用次数：1 相关文章所有 2 个版本

[PDF] openreview.net

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

QW Luo, MK Xie, YW Wang, SJ Huang - The Thirty-eighth Annual … - openreview.net

Offline-to-online (O2O) reinforcement learning (RL) provides an effective means of
leveraging an offline pre-trained policy as initialization to improve performance rapidly with …

高级搜索

QQ 群