that encourage reinforcement-learning (RL) algorithms to work effectively with real-life
data?” First, we address the problem of overfitting. RL algorithms are often tweaked and
tuned to specific environments when applied, calling into question whether learning
algorithms that work for one environment will work for others. We propose a methodology to
evaluate algorithms on distributions of environments, as opposed to a single environment …