查看文章

mlr.press 中的 [PDF]

Reinforcement learning in configurable continuous environments

作者

Alberto Maria Metelli, Emanuele Ghelfi, Marcello Restelli

发表日期

2019/5/24

研讨会论文

International Conference on Machine Learning

页码范围

4546-4555

出版商

PMLR

简介

Configurable Markov Decision Processes (Conf-MDPs) have been recently introduced as an extension of the usual MDP model to account for the possibility of configuring the environment to improve the agent’s performance. Currently, there is still no suitable algorithm to solve the learning problem for real-world Conf-MDPs. In this paper, we fill this gap by proposing a trust-region method, Relative Entropy Model Policy Search (REMPS), able to learn both the policy and the MDP configuration in continuous domains without requiring the knowledge of the true model of the environment. After introducing our approach and providing a finite-sample analysis, we empirically evaluate REMPS on both benchmark and realistic environments by comparing our results with those of the gradient methods.

引用总数

被引用次数：18

20182019202020212022202320241 2 3 3 4 2 3

学术搜索中的文章

Reinforcement learning in configurable continuous environments

AM Metelli, E Ghelfi, M Restelli - International Conference on Machine Learning, 2019

被引用次数：18 相关文章所有 11 个版本