Batch policy learning under constraints

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

被引用次数：235 相关文章所有 2 个版本

[PDF] ieee.org

Evolution of non-terrestrial networks from 5G to 6G: A survey

MM Azari, S Solanki, S Chatzinotas… - … surveys & tutorials, 2022 - ieeexplore.ieee.org

Non-terrestrial networks (NTNs) traditionally have certain limited applications. However, the
recent technological advancements and manufacturing cost reduction opened up myriad …

被引用次数：297 相关文章所有 10 个版本

[PDF] neurips.cc

Provably good batch off-policy reinforcement learning without great exploration

Y Liu, A Swaminathan, A Agarwal… - Advances in neural …, 2020 - proceedings.neurips.cc

Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes
tasks. Doing batch RL in a way that yields a reliable new policy in large domains is …

被引用次数：216 相关文章所有 7 个版本

[PDF] mlr.press

Constrained decision transformer for offline safe reinforcement learning

Z Liu, Z Guo, Y Yao, Z Cen, W Yu… - International …, 2023 - proceedings.mlr.press

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the
environment. We aim to tackle a more challenging problem: learning a safe policy from an …

被引用次数：40 相关文章所有 7 个版本

[PDF] nsf.gov

[PDF][PDF] Policy learning with constraints in model-free reinforcement learning: A survey

Y Liu, A Halev, X Liu - The 30th international joint conference on artificial …, 2021 - par.nsf.gov

Reinforcement Learning (RL) algorithms have had tremendous success in simulated
domains. These algorithms, however, often cannot be directly applied to physical systems …

被引用次数：117 相关文章所有 6 个版本

[PDF] acm.org

Explainable reinforcement learning: A survey and comparative review

S Milani, N Topin, M Veloso, F Fang - ACM Computing Surveys, 2024 - dl.acm.org

Explainable reinforcement learning (XRL) is an emerging subfield of explainable machine
learning that has attracted considerable attention in recent years. The goal of XRL is to …

被引用次数：28 相关文章

[PDF] neurips.cc

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc

We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

被引用次数：78 相关文章所有 7 个版本

[PDF] mlr.press

Minimax weight and q-function learning for off-policy evaluation

M Uehara, J Huang, N Jiang - International Conference on …, 2020 - proceedings.mlr.press

We provide theoretical investigations into off-policy evaluation in reinforcement learning
using function approximators for (marginalized) importance weights and value functions. Our …

被引用次数：184 相关文章所有 6 个版本

[PDF] jmlr.org

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

被引用次数：192 相关文章所有 7 个版本

[PDF] mlr.press

Model selection for offline reinforcement learning: Practical considerations for healthcare settings

S Tang, J Wiens - Machine Learning for Healthcare …, 2021 - proceedings.mlr.press

Reinforcement learning (RL) can be used to learn treatment policies and aid decision
making in healthcare. However, given the need for generalization over complex state/action …

被引用次数：79 相关文章所有 9 个版本

高级搜索

QQ 群