查看文章

arxiv.org 中的 [PDF]

Bias-reduced multi-step hindsight experience replay for efficient multi-goal reinforcement learning

作者

Rui Yang, Jiafei Lyu, Yu Yang, Jiangpeng Yan, Feng Luo, Dijun Luo, Lanqing Li, Xiu Li

发表日期

2021/2/25

期刊

arXiv preprint arXiv:2102.12962

简介

Multi-goal reinforcement learning is widely applied in planning and robot manipulation. Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims to tackle the two challenges via goal relabeling. However, HER-related works still need millions of samples and a huge computation. In this paper, we propose Multi-step Hindsight Experience Replay (MHER), incorporating multi-step relabeled returns based on -step relabeling to improve sample efficiency. Despite the advantages of -step relabeling, we theoretically and experimentally prove the off-policy -step bias introduced by -step relabeling may lead to poor performance in many environments. To address the above issue, two bias-reduced MHER algorithms, MHER() and Model-based MHER (MMHER) are presented. MHER() exploits the return while MMHER benefits from model-based value expansions. Experimental results on numerous multi-goal robotic tasks show that our solutions can successfully alleviate off-policy -step bias and achieve significantly higher sample efficiency than HER and Curriculum-guided HER with little additional computation beyond HER.

引用总数

被引用次数：8

20212022202320243 1 3 1

学术搜索中的文章

Jiangpeng Ya, Feng Luo, Dijun Luo, Lanqing Li, and Xiu Li. Bias-reduced multi-step hindsight experience replay*

R Yang, J Lyu, Y Yang - arXiv preprint arXiv:2102.12962, 2021

R Yang, J Lyu, Y Yang, J Yan, F Luo, D Luo, L Li, X Li - arXiv preprint arXiv:2102.12962, 2021

被引用次数：3 相关文章所有 2 个版本

Bias-reduced multi-step hindsight experience replay.*

R Yang, J Lyu, Y Yang, J Yan, F Luo, D Luo, L Li, X Li - CoRR, 2021