MCTS based on simple regret

I Danihelka, A Guez, J Schrittwieser… - … Conference on Learning …, 2022 - openreview.net

AlphaZero is a powerful reinforcement learning algorithm based on approximate policy
iteration and tree search. However, AlphaZero can fail to improve its policy network, if not …

被引用次数：54 相关文章所有 3 个版本

[PDF] jair.org

Combinatorial multi-armed bandits for real-time strategy games

S Ontanón - Journal of Artificial Intelligence Research, 2017 - jair.org

Games with large branching factors pose a significant challenge for game tree search
algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo …

被引用次数：108 相关文章所有 8 个版本

[PDF] aaai.org

Probabilistic programs as an action description language

RI Brafman, D Tolpin, O Wertheim - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

Actions description languages (ADLs), such as STRIPS, PDDL, and RDDL specify the input
format for planning algorithms. Unfortunately, their syntax is familiar to planning experts only …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Selecting computations: Theory and applications

N Hay, S Russell, D Tolpin, SE Shimony - arXiv preprint arXiv:1408.2048, 2014 - arxiv.org

Sequential decision problems are often approximately solvable by simulating possible future
action sequences. Metalevel decision procedures have been developed for selecting which …

被引用次数：94 相关文章所有 20 个版本

[PDF] neurips.cc

Maximum entropy monte-carlo planning

C Xiao, R Huang, J Mei… - Advances in Neural …, 2019 - proceedings.neurips.cc

We develop a new algorithm for online planning in large scale sequential decision problems
that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo …

被引用次数：39 相关文章所有 5 个版本

[PDF] neurips.cc

Monte carlo tree search with boltzmann exploration

M Painter, M Baioumy, N Hawes… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound
applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT …

被引用次数：3 相关文章所有 8 个版本

[PDF] neurips.cc

Planning in markov decision processes with gap-dependent sample complexity

A Jonsson, E Kaufmann, P Ménard… - Advances in …, 2020 - proceedings.neurips.cc

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for
planning in a Markov Decision Process in which transitions have a finite support. We prove …

被引用次数：31 相关文章所有 12 个版本

[PDF] google.com

Learning decision trees through Monte Carlo tree search: An empirical evaluation

C Nunes, M De Craene, H Langet… - … : Data Mining and …, 2020 - Wiley Online Library

Decision trees (DTs) are a widely used prediction tool, owing to their interpretability.
Standard learning methods follow a locally optimal approach that trades off prediction …

被引用次数：9 相关文章所有 3 个版本

[PDF] nature.com

Deep imagination is a close to optimal policy for planning in large decision trees under limited resources

C Mastrogiuseppe, R Moreno-Bote - Scientific reports, 2022 - nature.com

Many decisions involve choosing an uncertain course of action in deep and wide decision
trees, as when we plan to visit an exotic country for vacation. In these cases, exhaustive …

被引用次数：6 相关文章所有 8 个版本

[PDF] plos.org

Optimizing the depth and the direction of prospective planning using information values

CE Sezener, A Dezfouli, M Keramati - PLoS computational biology, 2019 - journals.plos.org

Evaluating the future consequences of actions is achievable by simulating a mental search
tree into the future. Expanding deep trees, however, is computationally taxing. Therefore …

被引用次数：28 相关文章所有 23 个版本

高级搜索

QQ 群

Policy improvement by planning with Gumbel

Combinatorial multi-armed bandits for real-time strategy games

Probabilistic programs as an action description language

Selecting computations: Theory and applications

Maximum entropy monte-carlo planning

Monte carlo tree search with boltzmann exploration

Planning in markov decision processes with gap-dependent sample complexity

Learning decision trees through Monte Carlo tree search: An empirical evaluation

Deep imagination is a close to optimal policy for planning in large decision trees under limited resources

Optimizing the depth and the direction of prospective planning using information values

引用