A review of safe reinforcement learning: Methods, theory and applications

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Evolution of non-terrestrial networks from 5G to 6G: A survey

MM Azari, S Solanki, S Chatzinotas… - … surveys & tutorials, 2022 - ieeexplore.ieee.org
Non-terrestrial networks (NTNs) traditionally have certain limited applications. However, the
recent technological advancements and manufacturing cost reduction opened up myriad …

Provably good batch off-policy reinforcement learning without great exploration

Y Liu, A Swaminathan, A Agarwal… - Advances in neural …, 2020 - proceedings.neurips.cc
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes
tasks. Doing batch RL in a way that yields a reliable new policy in large domains is …

Constrained decision transformer for offline safe reinforcement learning

Z Liu, Z Guo, Y Yao, Z Cen, W Yu… - International …, 2023 - proceedings.mlr.press
Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the
environment. We aim to tackle a more challenging problem: learning a safe policy from an …

[PDF][PDF] Policy learning with constraints in model-free reinforcement learning: A survey

Y Liu, A Halev, X Liu - The 30th international joint conference on artificial …, 2021 - par.nsf.gov
Reinforcement Learning (RL) algorithms have had tremendous success in simulated
domains. These algorithms, however, often cannot be directly applied to physical systems …

Explainable reinforcement learning: A survey and comparative review

S Milani, N Topin, M Veloso, F Fang - ACM Computing Surveys, 2024 - dl.acm.org
Explainable reinforcement learning (XRL) is an emerging subfield of explainable machine
learning that has attracted considerable attention in recent years. The goal of XRL is to …

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

Minimax weight and q-function learning for off-policy evaluation

M Uehara, J Huang, N Jiang - International Conference on …, 2020 - proceedings.mlr.press
We provide theoretical investigations into off-policy evaluation in reinforcement learning
using function approximators for (marginalized) importance weights and value functions. Our …

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

Model selection for offline reinforcement learning: Practical considerations for healthcare settings

S Tang, J Wiens - Machine Learning for Healthcare …, 2021 - proceedings.mlr.press
Reinforcement learning (RL) can be used to learn treatment policies and aid decision
making in healthcare. However, given the need for generalization over complex state/action …