Regularization in reinforcement learning

M Benosman - International Journal of Adaptive Control and …, 2018 - Wiley Online Library

In this paper, we present an overview of adaptive control by contrasting model‐based
approaches with data‐driven approaches. Indeed, we propose to classify adaptive …

被引用次数：118 相关文章所有 4 个版本

[PDF] neurips.cc

Provably good batch off-policy reinforcement learning without great exploration

Y Liu, A Swaminathan, A Agarwal… - Advances in neural …, 2020 - proceedings.neurips.cc

Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes
tasks. Doing batch RL in a way that yields a reliable new policy in large domains is …

被引用次数：230 相关文章所有 7 个版本

[PDF] jmlr.org

Regularized policy iteration with nonparametric function spaces

A Farahm, M Ghavamzadeh, C Szepesvári… - Journal of Machine …, 2016 - jmlr.org

We study two regularization-based approximate policy iteration algorithms, namely REG-
LSPI and REG-BRM, to solve reinforcement learning and planning problems in discounted …

被引用次数：130 相关文章所有 10 个版本

[PDF] neurips.cc

Iterative value-aware model learning

A Farahmand - Advances in Neural Information Processing …, 2018 - proceedings.neurips.cc

This paper introduces a model-based reinforcement learning (MBRL) framework that
incorporates the underlying decision problem in learning the transition model of the …

被引用次数：73 相关文章所有 9 个版本

[PDF] mlr.press

Control frequency adaptation via action persistence in batch reinforcement learning

AM Metelli, F Mazzolini, L Bisi… - International …, 2020 - proceedings.mlr.press

The choice of the control frequency of a system has a relevant impact on the ability of
reinforcement learning algorithms to learn a highly performing policy. In this paper, we …

被引用次数：52 相关文章所有 8 个版本

[PDF] mlr.press

Iterate averaging as regularization for stochastic gradient descent

G Neu, L Rosasco - Conference On Learning Theory, 2018 - proceedings.mlr.press

We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme,
broadly used in stochastic gradient methods. Rather than a uniform average of the iterates …

被引用次数：72 相关文章所有 6 个版本

[PDF] mlr.press

Importance weighted transfer of samples in reinforcement learning

A Tirinzoni, A Sessa, M Pirotta… - … on Machine Learning, 2018 - proceedings.mlr.press

We consider the transfer of experience samples (ie, tuples< s, a, s', r>) in reinforcement
learning (RL), collected from a set of source tasks to improve the learning process in a given …

被引用次数：62 相关文章所有 10 个版本

[PDF] mlr.press

Theoretical analysis of efficiency and robustness of softmax and gap-increasing operators in reinforcement learning

T Kozuno, E Uchibe, K Doya - The 22nd International …, 2019 - proceedings.mlr.press

In this paper, we propose and analyze conservative value iteration, which unifies value
iteration, soft value iteration, advantage learning, and dynamic policy programming. Our …

被引用次数：44 相关文章所有 4 个版本

[PDF] merl.com

Deep reinforcement learning for partial differential equation control

A Farahmand, S Nabi… - 2017 American Control …, 2017 - ieeexplore.ieee.org

This paper develops a data-driven method for control of partial differential equations (PDE)
based on deep reinforcement learning (RL) techniques. We design a Deep Fitted Q-Iteration …

被引用次数：54 相关文章所有 7 个版本

[PDF] mlr.press

Boosted fitted q-iteration

S Tosatto, M Pirotta, C d'Eramo… - … on Machine Learning, 2017 - proceedings.mlr.press

This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that
exploits a boosting procedure to estimate the action-value function in reinforcement learning …

被引用次数：50 相关文章所有 16 个版本

高级搜索

QQ 群