A meta-MDP approach to exploration for lifelong reinforcement learning

F Garcia, PS Thomas - Advances in Neural Information …, 2019 - proceedings.neurips.cc
In this paper we consider the problem of how a reinforcement learning agent that is tasked
with solving a sequence of reinforcement learning problems (a sequence of Markov decision …

Value function decomposition for iterative design of reinforcement learning agents

J MacGlashan, E Archer, A Devlic… - Advances in …, 2022 - proceedings.neurips.cc
Designing reinforcement learning (RL) agents is typically a difficult process that requires
numerous design iterations. Learning can fail for a multitude of reasons and standard RL …

Multiadvisor reinforcement learning for multiagent multiobjective smart home energy control

A Tittaferrante, A Yassine - IEEE Transactions on Artificial …, 2021 - ieeexplore.ieee.org
Effective automated smart home energy control is essential for smart grid approaches to
demand response (DR). This is a multiobjective adaptive control problem because it …

Reward-adaptive reinforcement learning: Dynamic policy gradient optimization for bipedal locomotion

C Huang, G Wang, Z Zhou, R Zhang… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Controlling a non-statically bipedal robot is challenging due to the complex dynamics and
multi-criterion optimization involved. Recent works have demonstrated the effectiveness of …

[HTML][HTML] Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach

F Golpayegani, S Ghanadbashi, A Zarchini - Sustainability, 2024 - mdpi.com
Sustainable manufacturing practices are crucial in job shop scheduling (JSS) to enhance
the resilience of production systems against resource shortages and regulatory changes …

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

F Rietz, E Schaffernicht, S Heinrich, JA Stork - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the
difficulties of engineering scalar reward functions and the inherent inefficiency of training …

Consistent aggregation of objectives with diverse time preferences requires non-Markovian rewards

S Pitis - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
As the capabilities of artificial agents improve, they are being increasingly deployed to
service multiple diverse objectives and stakeholders. However, the composition of these …

On value function representation of long horizon problems

L Lehnert, R Laroche, H van Seijen - … of the AAAI Conference on Artificial …, 2018 - ojs.aaai.org
Abstract In Reinforcement Learning, an intelligent agent has to make a sequence of
decisions to accomplish a goal. If this sequence is long, then the agent has to plan over a …

A federated advisory teacher–student framework with simultaneous learning agents

Y Lei, D Ye, T Zhu, W Zhou - Knowledge-Based Systems, 2024 - Elsevier
Multi-agent reinforcement learning requires numerous interactions with the environment and
other agents to learn an optimal policy. The teacher–student framework is one paradigm that …

On the value of myopic behavior in policy reuse

K Xu, C Bai, S Qiu, H He, B Zhao, Z Wang, W Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
In reinforcement learning, rationally reusing the policies acquired from other tasks or human …