Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

M Lu, H Zhong, T Zhang, J Blanchet - arXiv preprint arXiv:2404.03578, 2024 - arxiv.org
The sim-to-real gap, which represents the disparity between training and testing
environments, poses a significant challenge in reinforcement learning (RL). A promising …

Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

J Xu, R Yang, F Luo, M Fang, B Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning policies from offline datasets through offline reinforcement learning (RL) holds
promise for scaling data-driven decision-making and avoiding unsafe and costly online …

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

H Zhao, C Ye, Q Gu, T Zhang - arXiv preprint arXiv:2411.04625, 2024 - arxiv.org
Reverse-Kullback-Leibler (KL) regularization has emerged to be a predominant technique
used to enhance policy optimization in reinforcement learning (RL) and reinforcement …