M Tomar, Y Efroni, M Ghavamzadeh - arXiv preprint arXiv:1910.02919, 2019 - arxiv.org
Multi-step greedy policies have been extensively used in model-based reinforcement
learning (RL), both when a model of the environment is available (eg,~ in the game of Go) …