Model-based reinforcement learning exploiting state-action equivalence

N Kumar, K Levy, K Wang, S Mannor - arXiv preprint arXiv:2205.14327, 2022 - arxiv.org

Robust Markov decision processes (MDPs) provide a general framework to model decision
problems where the system dynamics are changing or only partially known. Efficient …

被引用次数：20 相关文章所有 2 个版本

Safe model‐based reinforcement learning for nonlinear optimal control with state and input constraints

Y Kim, JW Kim - AIChE Journal, 2022 - Wiley Online Library

Safety is a critical factor in reinforcement learning (RL) in chemical processes. In our
previous work, we had proposed a new stability‐guaranteed RL for unconstrained nonlinear …

被引用次数：17 相关文章所有 3 个版本

[PDF] aaai.org

Temple: Learning template of transitions for sample efficient multi-task rl

Y Sun, X Yin, F Huang - Proceedings of the AAAI Conference on …, 2021 - ojs.aaai.org

Transferring knowledge among various environments is important for efficiently learning
multiple tasks online. Most existing methods directly use the previously learned models or …

被引用次数：24 相关文章所有 9 个版本

[PDF] mlr.press

Exploration in reward machines with low regret

H Bourel, A Jonsson, OA Maillard… - International …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in
which high-level knowledge in the form of reward machines is available to the learner …

被引用次数：8 相关文章所有 2 个版本

[PDF] mdpi.com

Scaling up q-learning via exploiting state–action equivalence

Y Lyu, A Côme, Y Zhang, MS Talebi - Entropy, 2023 - mdpi.com

Recent success stories in reinforcement learning have demonstrated that leveraging
structural properties of the underlying environment is key in devising viable methods …

被引用次数：4 相关文章所有 9 个版本

[PDF] arxiv.org

A simple approach for state-action abstraction using a learned mdp homomorphism

AN Mavor-Parker, MJ Sargent, A Banino… - arXiv preprint arXiv …, 2022 - arxiv.org

Animals are able to rapidly infer from limited experience when sets of state action pairs have
equivalent reward and transition dynamics. On the other hand, modern reinforcement …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

N Kumar, K Levy, K Wang, S Mannor - arXiv preprint arXiv:2301.13642, 2023 - arxiv.org

We present an efficient robust value iteration for\texttt {s}-rectangular robust Markov Decision
Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

高级搜索

QQ 群