A review of reinforcement learning based energy management systems for electrified powertrains: Progress, challenge, and potential solution

AH Ganesh, B Xu - Renewable and Sustainable Energy Reviews, 2022 - Elsevier
The impact of internal combustion engine-powered automobiles on climate change due to
emissions and the depletion of fossil fuels has contributed to the progress of electrified …

Reinforcement learning based energy management systems and hydrogen refuelling stations for fuel cell electric vehicles: An overview

R Venkatasatish, C Dhanamjayulu - International Journal of Hydrogen …, 2022 - Elsevier
This paper examines the current state of the art of hydrogen refuelling stations-based
production and storage systems for fuel cell hybrid electric vehicles (FCHEV). Nowadays …

High-confidence off-policy evaluation

P Thomas, G Theocharous… - Proceedings of the AAAI …, 2015 - ojs.aaai.org
Many reinforcement learning algorithms use trajectories collected from the execution of one
or more policies to propose a new policy. Because execution of a bad policy can be costly or …

[PDF][PDF] Policy evaluation with temporal differences: A survey and comparison

C Dann, G Neumann, J Peters - The Journal of Machine Learning …, 2014 - jmlr.org
Policy evaluation is an essential step in most reinforcement learning approaches. It yields a
value function, the quality assessment of states for a given policy, which can be used in a …

Politex: Regret bounds for policy iteration using expert prediction

Y Abbasi-Yadkori, P Bartlett, K Bhatia… - International …, 2019 - proceedings.mlr.press
Abstract We present POLITEX (POLicy ITeration with EXpert advice), a variant of policy
iteration where each policy is a Boltzmann distribution over the sum of action-value function …

Least-squares temporal difference learning for the linear quadratic regulator

S Tu, B Recht - International Conference on Machine …, 2018 - proceedings.mlr.press
Reinforcement learning (RL) has been successfully used to solve many continuous control
tasks. Despite its impressive results however, fundamental questions regarding the sample …

Finite-sample analysis of proximal gradient td algorithms

B Liu, J Liu, M Ghavamzadeh, S Mahadevan… - arXiv preprint arXiv …, 2020 - arxiv.org
In this paper, we analyze the convergence rate of the gradient temporal difference learning
(GTD) family of algorithms. Previous analyses of this class of algorithms use ODE …

Discount factor as a regularizer in reinforcement learning

R Amit, R Meir, K Ciosek - International conference on …, 2020 - proceedings.mlr.press
Abstract Specifying a Reinforcement Learning (RL) task involves choosing a suitable
planning horizon, which is typically modeled by a discount factor. It is known that applying …

Gaussian processes for learning and control: A tutorial with examples

M Liu, G Chowdhary, BC Da Silva… - IEEE Control Systems …, 2018 - ieeexplore.ieee.org
Many challenging real-world control problems require adaptation and learning in the
presence of uncertainty. Examples of these challenging domains include aircraft adaptive …

Regularized policy iteration with nonparametric function spaces

A Farahm, M Ghavamzadeh, C Szepesvári… - Journal of Machine …, 2016 - jmlr.org
We study two regularization-based approximate policy iteration algorithms, namely REG-
LSPI and REG-BRM, to solve reinforcement learning and planning problems in discounted …