A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Exploration in deep reinforcement learning: A survey

P Ladosz, L Weng, M Kim, H Oh - Information Fusion, 2022 - Elsevier
This paper reviews exploration techniques in deep reinforcement learning. Exploration
techniques are of primary importance when solving sparse reward problems. In sparse …

Emergent tool use from multi-agent autocurricula

B Baker, I Kanitscheider, T Markov, Y Wu… - arXiv preprint arXiv …, 2019 - arxiv.org
Through multi-agent competition, the simple objective of hide-and-seek, and standard
reinforcement learning algorithms at scale, we find that agents create a self-supervised …

Towards continual reinforcement learning: A review and perspectives

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org
In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

Planning to explore via self-supervised world models

R Sekar, O Rybkin, K Daniilidis… - International …, 2020 - proceedings.mlr.press
Reinforcement learning allows solving complex tasks, however, the learning tends to be task-
specific and the sample efficiency remains a challenge. We present Plan2Explore, a self …

Exploration by random network distillation

Y Burda, H Edwards, A Storkey, O Klimov - arXiv preprint arXiv …, 2018 - arxiv.org
We introduce an exploration bonus for deep reinforcement learning methods that is easy to
implement and adds minimal overhead to the computation performed. The bonus is the error …

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com
Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

Large-scale study of curiosity-driven learning

Y Burda, H Edwards, D Pathak, A Storkey… - arXiv preprint arXiv …, 2018 - arxiv.org
Reinforcement learning algorithms rely on carefully engineering environment rewards that
are extrinsic to the agent. However, annotating each environment with hand-designed …

Aps: Active pretraining with successor features

H Liu, P Abbeel - International Conference on Machine …, 2021 - proceedings.mlr.press
We introduce a new unsupervised pretraining objective for reinforcement learning. During
the unsupervised reward-free pretraining phase, the agent maximizes mutual information …

Self-supervised exploration via disagreement

D Pathak, D Gandhi, A Gupta - International conference on …, 2019 - proceedings.mlr.press
Efficient exploration is a long-standing problem in sensorimotor learning. Major advances
have been demonstrated in noise-free, non-stochastic domains such as video games and …