sampling of states and using local trajectory optimizers to globally optimize a policy and
associated value function. This combination allows us to replace a dense multidimensional
grid with a much sparser adaptive sampling of states. Our focus is on finding steady state
policies for the deterministic time invariant discrete time control problems with continuous
states and actions often found in robotics. In this paper we show that we can now solve …