Expressing arbitrary reward functions as potential-based advice

Z Zhu, K Lin, AK Jain, J Zhou - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org

Reinforcement learning is a learning paradigm for solving sequential decision-making
problems. Recent years have witnessed remarkable progress in reinforcement learning …

被引用次数：699 相关文章所有 12 个版本

知识和数据协同驱动的群体智能决策方法研究综述

蒲志强，易建强，刘振，丘腾海，孙金林，李非墨 - 自动化学报, 2022 - aas.net.cn

群体智能(Collectire intelligence, CI) 系统具有广泛的应用前景. 当前的群体智能决策方法主要
包括知识驱动, 数据驱动两大类, 但各自存在优缺点. 本文指出, 知识与数据协同驱动将为群体 …

被引用次数：19 相关文章所有 2 个版本

[PDF] neurips.cc

Learning to utilize shaping rewards: A new approach of reward shaping

Y Hu, W Wang, H Jia, Y Wang… - Advances in …, 2020 - proceedings.neurips.cc

Reward shaping is an effective technique for incorporating domain knowledge into
reinforcement learning (RL). Existing approaches such as potential-based reward shaping …

被引用次数：191 相关文章所有 6 个版本

Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach

C Wang, J Wang, Y Shen… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org

In this paper, we propose a deep reinforcement learning (DRL)-based method that allows
unmanned aerial vehicles (UAVs) to execute navigation tasks in large-scale complex …

被引用次数：368 相关文章所有 3 个版本

[PDF] neurips.cc

Rudder: Return decomposition for delayed rewards

JA Arjona-Medina, M Gillhofer… - Advances in …, 2019 - proceedings.neurips.cc

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …

被引用次数：267 相关文章所有 9 个版本

[PDF] researchgate.net

Deep reinforcement learning and reward shaping based eco-driving control for automated HEVs among signalized intersections

J Li, X Wu, M Xu, Y Liu - Energy, 2022 - Elsevier

In a connected traffic environment with signalized intersections, eco-driving control needs to
co-optimize fuel economy (fuel consumption), driving safety (collisions and red lights), and …

被引用次数：56 相关文章所有 6 个版本

Human-centered reinforcement learning: A survey

G Li, R Gomez, K Nakamura… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org

Human-centered reinforcement learning (RL), in which an agent learns how to perform a
task from evaluative feedback delivered by a human observer, has become more and more …

被引用次数：148 相关文章

[PDF] mlr.press

State abstractions for lifelong reinforcement learning

D Abel, D Arumugam, L Lehnert… - … on Machine Learning, 2018 - proceedings.mlr.press

In lifelong reinforcement learning, agents must effectively transfer knowledge across tasks
while simultaneously addressing exploration, credit assignment, and generalization. State …

被引用次数：158 相关文章所有 9 个版本

[PDF] ssrn.com

Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management

BJ De Moor, J Gijsbrechts, RN Boute - European Journal of Operational …, 2022 - Elsevier

Deep reinforcement learning (DRL) has proven to be an effective, general-purpose
technology to develop 'good'replenishment policies in inventory management. We show …

被引用次数：73 相关文章所有 10 个版本

[PDF] mlr.press

What can learned intrinsic rewards capture?

Z Zheng, J Oh, M Hessel, Z Xu… - International …, 2020 - proceedings.mlr.press

The objective of a reinforcement learning agent is to behave so as to maximise the sum of a
suitable scalar function of state: the reward. These rewards are typically given and …

被引用次数：97 相关文章所有 9 个版本

高级搜索

QQ 群