Joint learning of reward machines and policies in environments with partially known semantics

D Furelos-Blanco, M Law, A Jonsson… - International …, 2023 - proceedings.mlr.press

Reward machines (RMs) are a recent formalism for representing the reward function of a
reinforcement learning task through a finite-state machine whose edges encode subgoals of …

被引用次数：13 相关文章所有 9 个版本

[PDF] iospress.nl

Learning task automata for reinforcement learning using hidden Markov models

A Abate, Y Almulla, J Fox, D Hyland, M Wooldridge - ECAI 2023, 2023 - ebooks.iospress.nl

Training reinforcement learning (RL) agents using scalar reward signals is often infeasible
when an environment has sparse and non-Markovian rewards. Moreover, handcrafting …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Noisy symbolic abstractions for deep RL: A case study with reward machines

AC Li, Z Chen, P Vaezipoor, TQ Klassen… - arXiv preprint arXiv …, 2022 - arxiv.org

Natural and formal languages provide an effective mechanism for humans to specify
instructions and reward functions. We investigate how to generate policies via RL when …

被引用次数：9 相关文章所有 6 个版本

[PDF] mlr.press

Exploration in reward machines with low regret

H Bourel, A Jonsson, OA Maillard… - International …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in
which high-level knowledge in the form of reward machines is available to the learner …

被引用次数：6 相关文章所有 2 个版本

[PDF] kr.org

Grounding LTLf specifications in image sequences

E Umili, R Capobianco… - Proceedings of the …, 2023 - proceedings.kr.org

A critical challenge in neuro-symbolic (NeSy) approaches is to handle the symbol grounding
problem without direct supervision. That is mapping high-dimensional raw data into an …

被引用次数：5 相关文章所有 9 个版本

[PDF] arxiv.org

Reward Machines for Deep RL in Noisy and Uncertain Environments

AC Li, Z Chen, TQ Klassen, P Vaezipoor… - arXiv preprint arXiv …, 2024 - arxiv.org

Reward Machines provide an automata-inspired structure for specifying instructions, safety
constraints, and other temporally extended reward-worthy behaviour. By exposing complex …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

高级搜索

QQ 群