for large problems with discrete action spaces. However, many real-world problems involve
continuous action spaces, where MCTS is not as effective as in discrete action spaces. This
is mainly due to common practices such as coarse discretization of the entire action space
and failure to exploit local smoothness. In this paper, we introduce Value-Gradient UCT (VG-
UCT), which combines traditional MCTS with gradient-based optimization of action particles …