Surprise-based intrinsic motivation for deep reinforcement learning

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

被引用次数：589 相关文章所有 2 个版本

[PDF] arxiv.org

Exploration in deep reinforcement learning: A survey

P Ladosz, L Weng, M Kim, H Oh - Information Fusion, 2022 - Elsevier

This paper reviews exploration techniques in deep reinforcement learning. Exploration
techniques are of primary importance when solving sparse reward problems. In sparse …

被引用次数：360 相关文章所有 5 个版本

[PDF] arxiv.org

Emergent tool use from multi-agent autocurricula

B Baker, I Kanitscheider, T Markov, Y Wu… - arXiv preprint arXiv …, 2019 - arxiv.org

Through multi-agent competition, the simple objective of hide-and-seek, and standard
reinforcement learning algorithms at scale, we find that agents create a self-supervised …

被引用次数：904 相关文章所有 3 个版本

[PDF] jair.org

Towards continual reinforcement learning: A review and perspectives

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org

In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

被引用次数：328 相关文章所有 9 个版本

[PDF] mlr.press

Planning to explore via self-supervised world models

R Sekar, O Rybkin, K Daniilidis… - International …, 2020 - proceedings.mlr.press

Reinforcement learning allows solving complex tasks, however, the learning tends to be task-
specific and the sample efficiency remains a challenge. We present Plan2Explore, a self …

被引用次数：455 相关文章所有 8 个版本

[PDF] ed.ac.uk

Exploration by random network distillation

Y Burda, H Edwards, A Storkey, O Klimov - arXiv preprint arXiv …, 2018 - arxiv.org

We introduce an exploration bonus for deep reinforcement learning methods that is easy to
implement and adds minimal overhead to the computation performed. The bonus is the error …

被引用次数：1564 相关文章所有 10 个版本

[PDF] nowpublishers.com

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com

Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

被引用次数：908 相关文章所有 17 个版本

[PDF] arxiv.org

Large-scale study of curiosity-driven learning

Y Burda, H Edwards, D Pathak, A Storkey… - arXiv preprint arXiv …, 2018 - arxiv.org

Reinforcement learning algorithms rely on carefully engineering environment rewards that
are extrinsic to the agent. However, annotating each environment with hand-designed …

被引用次数：914 相关文章所有 9 个版本

[PDF] mlr.press

Aps: Active pretraining with successor features

H Liu, P Abbeel - International Conference on Machine …, 2021 - proceedings.mlr.press

We introduce a new unsupervised pretraining objective for reinforcement learning. During
the unsupervised reward-free pretraining phase, the agent maximizes mutual information …

被引用次数：150 相关文章所有 5 个版本

[PDF] mlr.press

Self-supervised exploration via disagreement

D Pathak, D Gandhi, A Gupta - International conference on …, 2019 - proceedings.mlr.press

Efficient exploration is a long-standing problem in sensorimotor learning. Major advances
have been demonstrated in noise-free, non-stochastic domains such as video games and …

被引用次数：450 相关文章所有 6 个版本

高级搜索

QQ 群