B Chan, A Leung, J Bergstra - arXiv preprint arXiv:2410.14957, 2024 - arxiv.org
Offline-to-online reinforcement learning (O2O RL) aims to obtain a continually improving policy as it interacts with the environment, while ensuring the initial policy behaviour is …