based reinforcement learning more efficient and feasible in practice. We formulate Bayesian
hierarchical reinforcement learning as a partially observable semi-Markov decision process
(POSMDP). The main POSMDP task is partitioned into a hierarchy of POSMDP subtasks;
lower-level subtasks get solved first, then higher-level ones. We sample from a prior belief to
build an approximate model for each POSMDP, then solve using Monte Carlo Value …