C Wang,
X Yu,
C Bai, Q Zhang, Z Wang - Science China Information …, 2024 - Springer
In reinforcement learning (RL), training a policy from scratch with online experiences can be
inefficient because of the difficulties in exploration. Recently, offline RL provides a promising …