Replay in the brain has been viewed as rehearsal or, more recently, as sampling from a transition model. Here, we propose a new hypothesis: that replay is able to implement a form …
Recent efforts have incorporated large language models (LLMs) with external resources (eg, the Internet) or internal control flows (eg, prompt chaining) for tasks requiring grounding or …
Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is an important challenge in artificial intelligence. Two key approaches to this …
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and …
A hallmark of human intelligence is the ability to plan multiple steps into the future,. Despite decades of research,–, it is still debated whether skilled decision-makers plan more steps …
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the …
AlphaZero is a powerful reinforcement learning algorithm based on approximate policy iteration and tree search. However, AlphaZero can fail to improve its policy network, if not …
Model-based planning is often thought to be necessary for deep, careful reasoning and generalization in artificial agents. While recent successes of model-based reinforcement …
Abstract The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to groundbreaking results in artificial intelligence. However, AlphaZero, the …