相关文章- 学术资源搜索

Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in Neural …, 2020 - proceedings.neurips.cc

Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …

被引用次数：1610 相关文章所有 10 个版本

[PDF] arxiv.org

Offline reinforcement learning with implicit q-learning

I Kostrikov, A Nair, S Levine - arXiv preprint arXiv:2110.06169, 2021 - arxiv.org

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that
improves over the behavior policy that collected the dataset, while at the same time …

被引用次数：609 相关文章所有 6 个版本

[PDF] neurips.cc

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

被引用次数：57 相关文章所有 7 个版本

[PDF] neurips.cc

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

被引用次数：80 相关文章所有 5 个版本

[PDF] neurips.cc

Uncertainty-based offline reinforcement learning with diversified q-ensemble

G An, S Moon, JH Kim… - Advances in neural …, 2021 - proceedings.neurips.cc

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a
previously collected static dataset, bears algorithmic difficulties due to function …

被引用次数：231 相关文章所有 7 个版本

[PDF] mlr.press

Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl

T Yamagata, A Khalil… - … on Machine Learning, 2023 - proceedings.mlr.press

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional
policy produces promising results. The Decision Transformer (DT) combines the conditional …

被引用次数：49 相关文章所有 9 个版本

[PDF] neurips.cc

Online and offline reinforcement learning by planning with a learned model

J Schrittwieser, T Hubert, A Mandhane… - Advances in …, 2021 - proceedings.neurips.cc

Learning efficiently from small amounts of data has long been the focus of model-based
reinforcement learning, both for the online case when interacting with the environment, and …

被引用次数：110 相关文章所有 7 个版本

[PDF] arxiv.org

Confidence-conditioned value functions for offline reinforcement learning

J Hong, A Kumar, S Levine - arXiv preprint arXiv:2212.04607, 2022 - arxiv.org

Offline reinforcement learning (RL) promises the ability to learn effective policies solely
using existing, static datasets, without any costly online interaction. To do so, offline RL …

被引用次数：18 相关文章所有 4 个版本

[PDF] mlr.press

Emaq: Expected-max q-learning operator for simple yet effective offline and online rl

SKS Ghasemipour, D Schuurmans… - … on Machine Learning, 2021 - proceedings.mlr.press

Off-policy reinforcement learning (RL) holds the promise of sample-efficient learning of
decision-making policies by leveraging past experience. However, in the offline RL setting …

被引用次数：118 相关文章所有 6 个版本

[PDF] arxiv.org

Offline rl with no ood actions: In-sample learning via implicit value regularization

H Xu, L Jiang, J Li, Z Yang, Z Wang, VWK Chan… - arXiv preprint arXiv …, 2023 - arxiv.org

Most offline reinforcement learning (RL) methods suffer from the trade-off between improving
the policy to surpass the behavior policy and constraining the policy to limit the deviation …

被引用次数：50 相关文章所有 4 个版本

高级搜索

QQ 群

Conservative q-learning for offline reinforcement learning

Offline reinforcement learning with implicit q-learning

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

Mildly conservative q-learning for offline reinforcement learning

Uncertainty-based offline reinforcement learning with diversified q-ensemble

Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl

Online and offline reinforcement learning by planning with a learned model

Confidence-conditioned value functions for offline reinforcement learning

Emaq: Expected-max q-learning operator for simple yet effective offline and online rl

Offline rl with no ood actions: In-sample learning via implicit value regularization

相关搜索

引用