Evolutionary methods for addressing the temporal sequence learning problem generally fall into policy search as opposed to value function optimization approaches. Various re cent results have made the claim that the policy search approach is at best inefficient at solving episodic 'goal seeking' tasks i.e., tasks under which the reward is limited to describing properties associated with a successful outcome have no qualification for degrees of failure. This work demonstrates that such a conclusion is due to a lack of diversity in the training scenarios. We therefore return to the Acrobot 'height' task domain originally used to demonstrate complete failure in evolutionary policy search. This time a very simple stochastic sampling heuristic for defining a population of training configurations is introduced. Benchmarking two recent evolutionary policy search algorithms - Neural Evolution of Augmented Topologies (NEAT) and Symbiotic Bid-Based (SBB) Genetic Programming - under this condition demonstrates solutions as effective as those returned by advanced value function methods. Moreover this is achieved while remaining within the evaluation limit imposed by the original study.