D Zhou,
Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc
Recent studies have shown that episodic reinforcement learning (RL) is not more difficult
than bandits, even with a long planning horizon and unknown state transitions. However …