Least-squares methods in reinforcement learning for control

MT Cox - Artificial intelligence, 2005 - Elsevier

Various disciplines have examined the many phenomena of metacognition and have
produced numerous results, both positive and negative. I discuss some of these aspects of …

被引用次数：369 相关文章所有 12 个版本

[PDF] tandfonline.com

[图书][B] Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions: by Warren B. Powell (ed.), Wiley (2022). Hardback. ISBN …

I Halperin - 2022 - Taylor & Francis

What is reinforcement learning? How is reinforcement learning different from stochastic
optimization? And finally, can it be used for applications to quantitative finance for my current …

被引用次数：215 相关文章所有 6 个版本

[PDF] researchgate.net

Survey of model-based reinforcement learning: Applications on robotics

AS Polydoros, L Nalpantidis - Journal of Intelligent & Robotic Systems, 2017 - Springer

Reinforcement learning is an appealing approach for allowing robots to learn new tasks.
Relevant literature reveals a plethora of methods, but at the same time makes clear the lack …

被引用次数：696 相关文章所有 8 个版本

[PDF] uliege.be

[图书][B] Reinforcement learning and dynamic programming using function approximators

L Busoniu, R Babuska, B De Schutter, D Ernst - 2017 - taylorfrancis.com

From household appliances to applications in robotics, engineered systems involving
complex dynamics can only be as effective as the algorithms that control them. While …

被引用次数：1302 相关文章所有 12 个版本

[PDF] mlr.press

Generalization and exploration via randomized value functions

I Osband, B Van Roy, Z Wen - International Conference on …, 2016 - proceedings.mlr.press

We propose randomized least-squares value iteration (RLSVI)–a new reinforcement
learning algorithm designed to explore and generalize efficiently via linearly parameterized …

被引用次数：349 相关文章所有 8 个版本

[PDF] academia.edu

Learning Tetris using the noisy cross-entropy method

I Szita, A Lörincz - Neural computation, 2006 - ieeexplore.ieee.org

The cross-entropy method is an efficient and general optimization algorithm. However, its
applicability in reinforcement learning (RL) seems to be limited because it often converges …

被引用次数：340 相关文章所有 13 个版本

[PDF] springer.com

Integrating guidance into relational reinforcement learning

K Driessens, S Džeroski - Machine Learning, 2004 - Springer

Reinforcement learning, and Q-learning in particular, encounter two major problems when
dealing with large state spaces. First, learning the Q-function in tabular form may be …

被引用次数：177 相关文章所有 18 个版本

Reinforcement learning for a biped robot based on a CPG-actor-critic method

Y Nakamura, T Mori, M Sato, S Ishii - Neural networks, 2007 - Elsevier

Animals' rhythmic movements, such as locomotion, are considered to be controlled by
neural circuits called central pattern generators (CPGs), which generate oscillatory signals …

被引用次数：159 相关文章所有 11 个版本

[PDF] hal.science

Least-squares methods for policy iteration

L Buşoniu, A Lazaric, M Ghavamzadeh… - … learning: state-of-the-art, 2012 - Springer

Approximate reinforcement learning deals with the essential problem of applying
reinforcement learning in large and continuous state-action spaces, by using function …

被引用次数：37 相关文章所有 22 个版本

Reinforcement learning in games

I Szita - Reinforcement Learning: State-of-the-art, 2012 - Springer

Reinforcement learning and games have a long and mutually beneficial common history.
From one side, games are rich and challenging domains for testing reinforcement learning …

被引用次数：100 相关文章所有 3 个版本

高级搜索

QQ 群