Least-squares methods for policy iteration

L Buşoniu, T De Bruin, D Tolić, J Kober… - Annual Reviews in …, 2018 - Elsevier

Reinforcement learning (RL) offers powerful algorithms to search for optimal controllers of
systems with nonlinear, possibly stochastic dynamics that are unknown or highly uncertain …

被引用次数：478 相关文章所有 11 个版本

[图书][B] From shortest paths to reinforcement learning

P Brandimarte - 2021 - Springer

There are multiple viewpoints that an author may take when writing a book on dynamic
programming (DP), depending on the research community that (s) he belongs to and the …

被引用次数：33 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Optimal dispatch of PV inverters in unbalanced distribution systems using reinforcement learning

PP Vergara, M Salazar, JS Giraldo… - International Journal of …, 2022 - Elsevier

In this paper, a Reinforcement Learning (RL)-based approach to optimally dispatch PV
inverters in unbalanced distribution systems is presented. The proposed approach exploits a …

被引用次数：32 相关文章所有 15 个版本

[PDF] neurips.cc

Robust data-driven dynamic programming

GA Hanasusanto, D Kuhn - Advances in Neural Information …, 2013 - proceedings.neurips.cc

In stochastic optimal control the distribution of the exogenous noise is typically unknown and
must be inferred from limited data before dynamic programming (DP)-based solution …

被引用次数：85 相关文章所有 11 个版本

[PDF] academia.edu

Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states

Y Cui, T Matsubara, K Sugimoto - Neural networks, 2017 - Elsevier

We propose a new value function approach for model-free reinforcement learning in Markov
decision processes involving high dimensional states that addresses the issues of …

被引用次数：33 相关文章所有 5 个版本

[PDF] neurips.cc

On the use of non-stationary policies for stationary infinite-horizon Markov decision processes

B Scherrer, B Lesner - Advances in Neural Information …, 2012 - proceedings.neurips.cc

We consider infinite-horizon stationary $\gamma $-discounted Markov Decision Processes,
for which it is known that there exists a stationary optimal policy. Using Value and Policy …

被引用次数：48 相关文章所有 12 个版本

[HTML] sciencedirect.com

[HTML][HTML] Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains

LC Cobo, K Subramanian, CL Isbell Jr… - Artificial Intelligence, 2014 - Elsevier

Reinforcement learning (RL) and learning from demonstration (LfD) are two popular families
of algorithms for learning policies for sequential decision problems, but they are often …

被引用次数：28 相关文章所有 5 个版本

Overview of reinforcement learning for person re-identification

W Li, X Li, C Chen, A Song - IEEE Transactions on Biometrics …, 2022 - ieeexplore.ieee.org

For intelligent surveillance, the issue of person re-identification has attracted extensive
research interest due to its great academic value and broad application prospect. This issue …

被引用次数：1 相关文章

[PDF] arxiv.org

MARLIM: Multi-Agent Reinforcement Learning for Inventory Management

R Leluc, E Kadoche, A Bertoncello… - arXiv preprint arXiv …, 2023 - arxiv.org

Maintaining a balance between the supply and demand of products by optimizing
replenishment decisions is one of the most important challenges in the supply chain …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Least squares policy iteration with instrumental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage

S Moazeni, WR Scott, WB Powell - INFOR: Information Systems …, 2020 - Taylor & Francis

This article studies least-squares approximate policy iteration (API) methods with
parametrized value-function approximation. We study several variations of the policy …

被引用次数：20 相关文章所有 7 个版本

高级搜索

QQ 群