A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

Y Zhang, J Liu, C Li, Y Niu, Y Yang, Y Liu… - Proceedings of the …, 2024 - ojs.aaai.org
Offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of
offline pretrained policy using only a few online samples. Built on offline RL algorithms, most …