Combining q-learning and search with amortized value estimates

AK Shakya, G Pillai, S Chakrabarty - Expert Systems with Applications, 2023 - Elsevier

Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential
decision-making in complex problems. RL is inspired by trial-and-error based human/animal …

被引用次数：140 相关文章所有 2 个版本

[HTML] cell.com Full View

[HTML][HTML] Replay and compositional computation

Z Kurth-Nelson, T Behrens, G Wayne, K Miller… - Neuron, 2023 - cell.com

Replay in the brain has been viewed as rehearsal or, more recently, as sampling from a
transition model. Here, we propose a new hypothesis: that replay is able to implement a form …

被引用次数：43 相关文章所有 11 个版本

[PDF] arxiv.org

Cognitive architectures for language agents

TR Sumers, S Yao, K Narasimhan… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent efforts have incorporated large language models (LLMs) with external resources (eg,
the Internet) or internal control flows (eg, prompt chaining) for tasks requiring grounding or …

被引用次数：153 相关文章所有 3 个版本

[PDF] nowpublishers.com

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com

Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

被引用次数：824 相关文章所有 17 个版本

[PDF] mlr.press

Revisiting fundamentals of experience replay

W Fedus, P Ramachandran… - International …, 2020 - proceedings.mlr.press

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but
there remain significant gaps in our understanding. We therefore present a systematic and …

被引用次数：311 相关文章所有 12 个版本

[PDF] google.com

Expertise increases planning depth in human gameplay

B van Opheusden, I Kuperwajs, G Galbiati, Z Bnaya… - Nature, 2023 - nature.com

A hallmark of human intelligence is the ability to plan multiple steps into the future,. Despite
decades of research,–, it is still debated whether skilled decision-makers plan more steps …

被引用次数：43 相关文章所有 7 个版本

[PDF] mlr.press

Muesli: Combining improvements in policy optimization

M Hessel, I Danihelka, F Viola, A Guez… - International …, 2021 - proceedings.mlr.press

We propose a novel policy update that combines regularized policy optimization with model
learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the …

被引用次数：83 相关文章所有 5 个版本

[PDF] openreview.net

Policy improvement by planning with Gumbel

I Danihelka, A Guez, J Schrittwieser… - … Conference on Learning …, 2022 - openreview.net

AlphaZero is a powerful reinforcement learning algorithm based on approximate policy
iteration and tree search. However, AlphaZero can fail to improve its policy network, if not …

被引用次数：67 相关文章所有 3 个版本

[PDF] arxiv.org

On the role of planning in model-based deep reinforcement learning

JB Hamrick, AL Friesen, F Behbahani, A Guez… - arXiv preprint arXiv …, 2020 - arxiv.org

Model-based planning is often thought to be necessary for deep, careful reasoning and
generalization in artificial agents. While recent successes of model-based reinforcement …

被引用次数：85 相关文章所有 3 个版本

[PDF] mlr.press

Monte-Carlo tree search as regularized policy optimization

JB Grill, F Altché, Y Tang, T Hubert… - International …, 2020 - proceedings.mlr.press

Abstract The combination of Monte-Carlo tree search (MCTS) with deep reinforcement
learning has led to groundbreaking results in artificial intelligence. However, AlphaZero, the …

被引用次数：77 相关文章所有 12 个版本

高级搜索

QQ 群