Sample efficient offline-to-online reinforcement learning

S Guo, L Zou, H Chen, B Qu, H Chi… - … on Knowledge and …, 2023 - ieeexplore.ieee.org
Offline reinforcement learning (RL) makes it possible to train the agents entirely from a
previously collected dataset. However, constrained by the quality of the offline dataset …

Bayesian reparameterization of reward-conditioned reinforcement learning with energy-based models

W Ding, T Che, D Zhao… - … Conference on Machine …, 2023 - proceedings.mlr.press
Recently, reward-conditioned reinforcement learning (RCRL) has gained popularity due to
its simplicity, flexibility, and off-policy nature. However, we will show that current RCRL …

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

L Du, M Chen, M Sun, S Ji, P Cheng, J Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Data is a critical asset in AI, as high-quality datasets can significantly improve the
performance of machine learning models. In safety-critical domains such as autonomous …

An invitation to deep reinforcement learning

B Jaeger, A Geiger - arXiv preprint arXiv:2312.08365, 2023 - arxiv.org
Training a deep neural network to maximize a target objective has become the standard
recipe for successful machine learning over the last decade. These networks can be …

[HTML][HTML] Offline reinforcement learning in high-dimensional stochastic environments

F Hêche, O Barakat, T Desmettre, T Marx… - Neural Computing and …, 2024 - Springer
Offline reinforcement learning (RL) has emerged as a promising paradigm for real-world
applications since it aims to train policies directly from datasets of past interactions with the …

Robust offline policy evaluation and optimization with heavy-tailed rewards

J Zhu, R Wan, Z Qi, S Luo, C Shi - arXiv preprint arXiv:2310.18715, 2023 - arxiv.org
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in
scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world …

Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations

K Jiang, Z Jiang, X Jiang, Y Xie… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Making proper decision online in complex environment during the blast furnace (BF)
operation is a key factor in achieving long-term success and profitability in the steel …

Iql-td-mpc: Implicit q-learning for hierarchical model predictive control

R Chitnis, Y Xu, B Hashemi, L Lehnert, U Dogan… - arXiv preprint arXiv …, 2023 - arxiv.org
Model-based reinforcement learning (RL) has shown great promise due to its sample
efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline …

Learning from sparse offline datasets via conservative density estimation

Z Cen, Z Liu, Z Wang, Y Yao, H Lam, D Zhao - arXiv preprint arXiv …, 2024 - arxiv.org
Offline reinforcement learning (RL) offers a promising direction for learning policies from pre-
collected datasets without requiring further interactions with the environment. However …

Deep reinforcement learning for personalized treatment recommendation

M Liu, X Shen, W Pan - Statistics in medicine, 2022 - Wiley Online Library
In precision medicine, the ultimate goal is to recommend the most effective treatment to an
individual patient based on patient‐specific molecular and clinical profiles, possibly high …