Reinforcement learning for control: Performance, stability, and deep approximators

L Buşoniu, T De Bruin, D Tolić, J Kober… - Annual Reviews in …, 2018 - Elsevier
Reinforcement learning (RL) offers powerful algorithms to search for optimal controllers of
systems with nonlinear, possibly stochastic dynamics that are unknown or highly uncertain …

[图书][B] From shortest paths to reinforcement learning

P Brandimarte - 2021 - Springer
There are multiple viewpoints that an author may take when writing a book on dynamic
programming (DP), depending on the research community that (s) he belongs to and the …

[HTML][HTML] Optimal dispatch of PV inverters in unbalanced distribution systems using reinforcement learning

PP Vergara, M Salazar, JS Giraldo… - International Journal of …, 2022 - Elsevier
In this paper, a Reinforcement Learning (RL)-based approach to optimally dispatch PV
inverters in unbalanced distribution systems is presented. The proposed approach exploits a …

Robust data-driven dynamic programming

GA Hanasusanto, D Kuhn - Advances in Neural Information …, 2013 - proceedings.neurips.cc
In stochastic optimal control the distribution of the exogenous noise is typically unknown and
must be inferred from limited data before dynamic programming (DP)-based solution …

Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states

Y Cui, T Matsubara, K Sugimoto - Neural networks, 2017 - Elsevier
We propose a new value function approach for model-free reinforcement learning in Markov
decision processes involving high dimensional states that addresses the issues of …

On the use of non-stationary policies for stationary infinite-horizon Markov decision processes

B Scherrer, B Lesner - Advances in Neural Information …, 2012 - proceedings.neurips.cc
We consider infinite-horizon stationary $\gamma $-discounted Markov Decision Processes,
for which it is known that there exists a stationary optimal policy. Using Value and Policy …

[HTML][HTML] Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains

LC Cobo, K Subramanian, CL Isbell Jr… - Artificial Intelligence, 2014 - Elsevier
Reinforcement learning (RL) and learning from demonstration (LfD) are two popular families
of algorithms for learning policies for sequential decision problems, but they are often …

Overview of reinforcement learning for person re-identification

W Li, X Li, C Chen, A Song - IEEE Transactions on Biometrics …, 2022 - ieeexplore.ieee.org
For intelligent surveillance, the issue of person re-identification has attracted extensive
research interest due to its great academic value and broad application prospect. This issue …

MARLIM: Multi-Agent Reinforcement Learning for Inventory Management

R Leluc, E Kadoche, A Bertoncello… - arXiv preprint arXiv …, 2023 - arxiv.org
Maintaining a balance between the supply and demand of products by optimizing
replenishment decisions is one of the most important challenges in the supply chain …

Least squares policy iteration with instrumental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage

S Moazeni, WR Scott, WB Powell - INFOR: Information Systems …, 2020 - Taylor & Francis
This article studies least-squares approximate policy iteration (API) methods with
parametrized value-function approximation. We study several variations of the policy …