H Furuta, Y Matsuo, SS Gu - arXiv e-prints, 2021 - ui.adsabs.harvard.edu
How to extract as much learning signal from each trajectory data has been a key problem in
reinforcement learning (RL), where sample inefficiency has posed serious challenges for …