Policy improvement by planning with Gumbel

I Danihelka, A Guez, J Schrittwieser… - … Conference on Learning …, 2022 - openreview.net
AlphaZero is a powerful reinforcement learning algorithm based on approximate policy
iteration and tree search. However, AlphaZero can fail to improve its policy network, if not …

Combinatorial multi-armed bandits for real-time strategy games

S Ontanón - Journal of Artificial Intelligence Research, 2017 - jair.org
Games with large branching factors pose a significant challenge for game tree search
algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo …

Probabilistic programs as an action description language

RI Brafman, D Tolpin, O Wertheim - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Actions description languages (ADLs), such as STRIPS, PDDL, and RDDL specify the input
format for planning algorithms. Unfortunately, their syntax is familiar to planning experts only …

Selecting computations: Theory and applications

N Hay, S Russell, D Tolpin, SE Shimony - arXiv preprint arXiv:1408.2048, 2014 - arxiv.org
Sequential decision problems are often approximately solvable by simulating possible future
action sequences. Metalevel decision procedures have been developed for selecting which …

Maximum entropy monte-carlo planning

C Xiao, R Huang, J Mei… - Advances in Neural …, 2019 - proceedings.neurips.cc
We develop a new algorithm for online planning in large scale sequential decision problems
that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo …

Monte carlo tree search with boltzmann exploration

M Painter, M Baioumy, N Hawes… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound
applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT …

Planning in markov decision processes with gap-dependent sample complexity

A Jonsson, E Kaufmann, P Ménard… - Advances in …, 2020 - proceedings.neurips.cc
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for
planning in a Markov Decision Process in which transitions have a finite support. We prove …

Learning decision trees through Monte Carlo tree search: An empirical evaluation

C Nunes, M De Craene, H Langet… - … : Data Mining and …, 2020 - Wiley Online Library
Decision trees (DTs) are a widely used prediction tool, owing to their interpretability.
Standard learning methods follow a locally optimal approach that trades off prediction …

Deep imagination is a close to optimal policy for planning in large decision trees under limited resources

C Mastrogiuseppe, R Moreno-Bote - Scientific reports, 2022 - nature.com
Many decisions involve choosing an uncertain course of action in deep and wide decision
trees, as when we plan to visit an exotic country for vacation. In these cases, exhaustive …

Optimizing the depth and the direction of prospective planning using information values

CE Sezener, A Dezfouli, M Keramati - PLoS computational biology, 2019 - journals.plos.org
Evaluating the future consequences of actions is achievable by simulating a mental search
tree into the future. Expanding deep trees, however, is computationally taxing. Therefore …