查看文章

whiterose.ac.uk 中的 [PDF]

Policy Invariance under Reward Transformations for Multi-Objective Reinforcement Learning

作者

P Mannion, S Devlin, K Mason, J Duggan, E Howley

发表日期

2017/11/8

期刊

Neurocomputing

卷号

263

页码范围

60-73

出版商

Elsevier

简介

Reinforcement Learning (RL) is a powerful and well-studied Machine Learning paradigm, where an agent learns to improve its performance in an environment by maximising a reward signal. In multi-objective Reinforcement Learning (MORL) the reward signal is a vector, where each component represents the performance on a different objective. Reward shaping is a well-established family of techniques that have been successfully used to improve the performance and learning speed of RL agents in single-objective problems. The basic premise of reward shaping is to add an additional shaping reward to the reward naturally received from the environment, to incorporate domain knowledge and guide an agent’s exploration. Potential-Based Reward Shaping (PBRS) is a specific form of reward shaping that offers additional guarantees. In this paper, we extend the theoretical guarantees of PBRS to MORL problems …

引用总数

被引用次数：55

201720182019202020212022202320245 4 3 11 7 10 12 2

学术搜索中的文章

Policy invariance under reward transformations for multi-objective reinforcement learning

P Mannion, S Devlin, K Mason, J Duggan, E Howley - Neurocomputing, 2017

被引用次数：55 相关文章所有 6 个版本