C Dann, Y Mansour, M Mohri, A Sekhari… - Proceedings of the 34th …, 2020 - dl.acm.org
We study RL in the tabular MDP setting where the agent receives additional observations
per step in the form of transitions samples. Such additional observations can be provided in …