Multi-advisor reinforcement learning

F Garcia, PS Thomas - Advances in Neural Information …, 2019 - proceedings.neurips.cc

In this paper we consider the problem of how a reinforcement learning agent that is tasked
with solving a sequence of reinforcement learning problems (a sequence of Markov decision …

被引用次数：52 相关文章所有 13 个版本

[PDF] neurips.cc

Value function decomposition for iterative design of reinforcement learning agents

J MacGlashan, E Archer, A Devlic… - Advances in …, 2022 - proceedings.neurips.cc

Designing reinforcement learning (RL) agents is typically a difficult process that requires
numerous design iterations. Learning can fail for a multitude of reasons and standard RL …

被引用次数：9 相关文章所有 6 个版本

Multiadvisor reinforcement learning for multiagent multiobjective smart home energy control

A Tittaferrante, A Yassine - IEEE Transactions on Artificial …, 2021 - ieeexplore.ieee.org

Effective automated smart home energy control is essential for smart grid approaches to
demand response (DR). This is a multiobjective adaptive control problem because it …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Reward-adaptive reinforcement learning: Dynamic policy gradient optimization for bipedal locomotion

C Huang, G Wang, Z Zhou, R Zhang… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Controlling a non-statically bipedal robot is challenging due to the complex dynamics and
multi-criterion optimization involved. Recent works have demonstrated the effectiveness of …

被引用次数：23 相关文章所有 7 个版本

[HTML] mdpi.com

[HTML][HTML] Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach

F Golpayegani, S Ghanadbashi, A Zarchini - Sustainability, 2024 - mdpi.com

Sustainable manufacturing practices are crucial in job shop scheduling (JSS) to enhance
the resilience of production systems against resource shortages and regulatory changes …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

F Rietz, E Schaffernicht, S Heinrich, JA Stork - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the
difficulties of engineering scalar reward functions and the inherent inefficiency of training …

被引用次数：1 相关文章所有 3 个版本

[PDF] neurips.cc

Consistent aggregation of objectives with diverse time preferences requires non-Markovian rewards

S Pitis - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

As the capabilities of artificial agents improve, they are being increasingly deployed to
service multiple diverse objectives and stakeholders. However, the composition of these …

被引用次数：5 相关文章所有 6 个版本

[PDF] aaai.org

On value function representation of long horizon problems

L Lehnert, R Laroche, H van Seijen - … of the AAAI Conference on Artificial …, 2018 - ojs.aaai.org

Abstract In Reinforcement Learning, an intelligent agent has to make a sequence of
decisions to accomplish a goal. If this sequence is long, then the agent has to plan over a …

被引用次数：27 相关文章所有 5 个版本

A federated advisory teacher–student framework with simultaneous learning agents

Y Lei, D Ye, T Zhu, W Zhou - Knowledge-Based Systems, 2024 - Elsevier

Multi-agent reinforcement learning requires numerous interactions with the environment and
other agents to learn an optimal policy. The teacher–student framework is one paradigm that …

[PDF] arxiv.org

On the value of myopic behavior in policy reuse

K Xu, C Bai, S Qiu, H He, B Zhao, Z Wang, W Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
In reinforcement learning, rationally reusing the policies acquired from other tasks or human …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

Multi-advisor reinforcement learning

A meta-MDP approach to exploration for lifelong reinforcement learning

Value function decomposition for iterative design of reinforcement learning agents

Multiadvisor reinforcement learning for multiagent multiobjective smart home energy control

Reward-adaptive reinforcement learning: Dynamic policy gradient optimization for bipedal locomotion

[HTML][HTML] Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Consistent aggregation of objectives with diverse time preferences requires non-Markovian rewards

On value function representation of long horizon problems

A federated advisory teacher–student framework with simultaneous learning agents

On the value of myopic behavior in policy reuse

引用