Reinforcement learning algorithms: A brief survey

AK Shakya, G Pillai, S Chakrabarty - Expert Systems with Applications, 2023 - Elsevier
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential
decision-making in complex problems. RL is inspired by trial-and-error based human/animal …

[HTML][HTML] Replay and compositional computation

Z Kurth-Nelson, T Behrens, G Wayne, K Miller… - Neuron, 2023 - cell.com
Replay in the brain has been viewed as rehearsal or, more recently, as sampling from a
transition model. Here, we propose a new hypothesis: that replay is able to implement a form …

Cognitive architectures for language agents

TR Sumers, S Yao, K Narasimhan… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent efforts have incorporated large language models (LLMs) with external resources (eg,
the Internet) or internal control flows (eg, prompt chaining) for tasks requiring grounding or …

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com
Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

Revisiting fundamentals of experience replay

W Fedus, P Ramachandran… - International …, 2020 - proceedings.mlr.press
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but
there remain significant gaps in our understanding. We therefore present a systematic and …

Expertise increases planning depth in human gameplay

B van Opheusden, I Kuperwajs, G Galbiati, Z Bnaya… - Nature, 2023 - nature.com
A hallmark of human intelligence is the ability to plan multiple steps into the future,. Despite
decades of research,–, it is still debated whether skilled decision-makers plan more steps …

Muesli: Combining improvements in policy optimization

M Hessel, I Danihelka, F Viola, A Guez… - International …, 2021 - proceedings.mlr.press
We propose a novel policy update that combines regularized policy optimization with model
learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the …

Policy improvement by planning with Gumbel

I Danihelka, A Guez, J Schrittwieser… - … Conference on Learning …, 2022 - openreview.net
AlphaZero is a powerful reinforcement learning algorithm based on approximate policy
iteration and tree search. However, AlphaZero can fail to improve its policy network, if not …

On the role of planning in model-based deep reinforcement learning

JB Hamrick, AL Friesen, F Behbahani, A Guez… - arXiv preprint arXiv …, 2020 - arxiv.org
Model-based planning is often thought to be necessary for deep, careful reasoning and
generalization in artificial agents. While recent successes of model-based reinforcement …

Monte-Carlo tree search as regularized policy optimization

JB Grill, F Altché, Y Tang, T Hubert… - International …, 2020 - proceedings.mlr.press
Abstract The combination of Monte-Carlo tree search (MCTS) with deep reinforcement
learning has led to groundbreaking results in artificial intelligence. However, AlphaZero, the …