J Xu, R Yang, F Luo, M Fang, B Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online …
Reverse-Kullback-Leibler (KL) regularization has emerged to be a predominant technique used to enhance policy optimization in reinforcement learning (RL) and reinforcement …