Regularized off-policy TD-learning

A review of reinforcement learning based energy management systems for electrified powertrains: Progress, challenge, and potential solution

AH Ganesh, B Xu - Renewable and Sustainable Energy Reviews, 2022 - Elsevier

The impact of internal combustion engine-powered automobiles on climate change due to
emissions and the depletion of fossil fuels has contributed to the progress of electrified …

被引用次数：174 相关文章所有 5 个版本

Reinforcement learning based energy management systems and hydrogen refuelling stations for fuel cell electric vehicles: An overview

R Venkatasatish, C Dhanamjayulu - International Journal of Hydrogen …, 2022 - Elsevier

This paper examines the current state of the art of hydrogen refuelling stations-based
production and storage systems for fuel cell hybrid electric vehicles (FCHEV). Nowadays …

被引用次数：62 相关文章所有 2 个版本

[PDF] aaai.org

High-confidence off-policy evaluation

P Thomas, G Theocharous… - Proceedings of the AAAI …, 2015 - ojs.aaai.org

Many reinforcement learning algorithms use trajectories collected from the execution of one
or more policies to propose a new policy. Because execution of a bad policy can be costly or …

被引用次数：327 相关文章所有 15 个版本

[PDF] jmlr.org

[PDF][PDF] Policy evaluation with temporal differences: A survey and comparison

C Dann, G Neumann, J Peters - The Journal of Machine Learning …, 2014 - jmlr.org

Policy evaluation is an essential step in most reinforcement learning approaches. It yields a
value function, the quality assessment of states for a given policy, which can be used in a …

被引用次数：293 相关文章所有 21 个版本

[PDF] mlr.press

Politex: Regret bounds for policy iteration using expert prediction

Y Abbasi-Yadkori, P Bartlett, K Bhatia… - International …, 2019 - proceedings.mlr.press

Abstract We present POLITEX (POLicy ITeration with EXpert advice), a variant of policy
iteration where each policy is a Boltzmann distribution over the sum of action-value function …

被引用次数：148 相关文章所有 7 个版本

[PDF] mlr.press

Least-squares temporal difference learning for the linear quadratic regulator

S Tu, B Recht - International Conference on Machine …, 2018 - proceedings.mlr.press

Reinforcement learning (RL) has been successfully used to solve many continuous control
tasks. Despite its impressive results however, fundamental questions regarding the sample …

被引用次数：143 相关文章所有 4 个版本

[PDF] arxiv.org

Finite-sample analysis of proximal gradient td algorithms

B Liu, J Liu, M Ghavamzadeh, S Mahadevan… - arXiv preprint arXiv …, 2020 - arxiv.org

In this paper, we analyze the convergence rate of the gradient temporal difference learning
(GTD) family of algorithms. Previous analyses of this class of algorithms use ODE …

被引用次数：178 相关文章所有 17 个版本

[PDF] mlr.press

Discount factor as a regularizer in reinforcement learning

R Amit, R Meir, K Ciosek - International conference on …, 2020 - proceedings.mlr.press

Abstract Specifying a Reinforcement Learning (RL) task involves choosing a suitable
planning horizon, which is typically modeled by a discount factor. It is known that applying …

被引用次数：80 相关文章所有 6 个版本

[PDF] google.com

Gaussian processes for learning and control: A tutorial with examples

M Liu, G Chowdhary, BC Da Silva… - IEEE Control Systems …, 2018 - ieeexplore.ieee.org

Many challenging real-world control problems require adaptation and learning in the
presence of uncertainty. Examples of these challenging domains include aircraft adaptive …

被引用次数：114 相关文章所有 5 个版本

[PDF] jmlr.org

Regularized policy iteration with nonparametric function spaces

A Farahm, M Ghavamzadeh, C Szepesvári… - Journal of Machine …, 2016 - jmlr.org

We study two regularization-based approximate policy iteration algorithms, namely REG-
LSPI and REG-BRM, to solve reinforcement learning and planning problems in discounted …

被引用次数：126 相关文章所有 10 个版本

高级搜索

QQ 群