Generalization and regularization in dqn

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org

In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

被引用次数：320 相关文章所有 9 个版本

[PDF] github.io

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

被引用次数：646 相关文章所有 4 个版本

[PDF] mlr.press

Scaling laws for reward model overoptimization

L Gao, J Schulman, J Hilton - International Conference on …, 2023 - proceedings.mlr.press

In reinforcement learning from human feedback, it is common to optimize against a reward
model trained to predict human preferences. Because the reward model is an imperfect …

被引用次数：352 相关文章所有 7 个版本

[PDF] jair.org Full View

A survey of zero-shot generalisation in deep reinforcement learning

R Kirk, A Zhang, E Grefenstette, T Rocktäschel - Journal of Artificial …, 2023 - jair.org

The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to
produce RL algorithms whose policies generalise well to novel unseen situations at …

被引用次数：394 相关文章所有 9 个版本

[PDF] mlr.press

Leveraging procedural generation to benchmark reinforcement learning

K Cobbe, C Hesse, J Hilton… - … conference on machine …, 2020 - proceedings.mlr.press

Abstract We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like
environments designed to benchmark both sample efficiency and generalization in …

被引用次数：623 相关文章所有 6 个版本

[PDF] mlr.press

Quantifying generalization in reinforcement learning

K Cobbe, O Klimov, C Hesse, T Kim… - … on machine learning, 2019 - proceedings.mlr.press

In this paper, we investigate the problem of overfitting in deep reinforcement learning.
Among the most common benchmarks in RL, it is customary to use the same environments …

被引用次数：760 相关文章所有 4 个版本

[PDF] arxiv.org

A comprehensive survey of data augmentation in visual reinforcement learning

G Ma, Z Wang, Z Yuan, X Wang, B Yuan… - arXiv preprint arXiv …, 2022 - arxiv.org

Visual reinforcement learning (RL), which makes decisions directly from high-dimensional
visual inputs, has demonstrated significant potential in various domains. However …

被引用次数：31 相关文章所有 3 个版本

[PDF] arxiv.org

Contrastive behavioral similarity embeddings for generalization in reinforcement learning

R Agarwal, MC Machado, PS Castro… - arXiv preprint arXiv …, 2021 - arxiv.org

Reinforcement learning methods trained on few environments rarely learn policies that
generalize to unseen environments. To improve generalization, we incorporate the inherent …

被引用次数：207 相关文章所有 11 个版本

[PDF] neurips.cc

Stabilizing deep q-learning with convnets and vision transformers under data augmentation

N Hansen, H Su, X Wang - Advances in neural information …, 2021 - proceedings.neurips.cc

While agents trained by Reinforcement Learning (RL) can solve increasingly challenging
tasks directly from visual observations, generalizing learned skills to novel environments …

被引用次数：140 相关文章所有 10 个版本

[PDF] arxiv.org

Deep reinforcement learning

SE Li - Reinforcement learning for sequential decision and …, 2023 - Springer

Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …

被引用次数：418 相关文章所有 9 个版本

高级搜索

QQ 群