查看文章

arxiv.org 中的 [PDF]

Skill-critic: Refining learned skills for reinforcement learning

作者

Ce Hao, Catherine Weaver, Chen Tang, Kenta Kawamoto, Masayoshi Tomizuka, Wei Zhan

发表日期

2023/6/14

期刊

arXiv preprint arXiv:2306.08388

简介

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data, but the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose fine-tuning the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low and high-level policies; these policies are also initialized and regularized by the latent space learned from offline demonstrations to guide the joint policy optimization. We validate our approach in multiple sparse RL environments, including a new sparse reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for optimal performance. Images and videos are available at https://sites.google.com/view/skill-critic. We plan to open source the code with the final version.

引用总数

被引用次数：2

202320241 1

学术搜索中的文章

Skill-critic: Refining learned skills for reinforcement learning

C Hao, C Weaver, C Tang, K Kawamoto, M Tomizuka… - arXiv preprint arXiv:2306.08388, 2023

被引用次数：2 相关文章所有 2 个版本