B Scherrer, B Lesner - Advances in Neural Information …, 2012 - proceedings.neurips.cc
We consider infinite-horizon stationary $\gamma $-discounted Markov Decision Processes,
for which it is known that there exists a stationary optimal policy. Using Value and Policy …