On the complexity of solving Markov decision problems

N Schweighofer, SC Tanaka… - Annals of the New York …, 2007 - Wiley Online Library

The ability to select an action by considering both delays and amount of reward outcome is
critical for survival and well‐being of animals and humans. Previous animal experiments …

被引用次数：83 相关文章所有 9 个版本

[PDF] academia.edu

[PDF][PDF] Topological Value Iteration Algorithm for Markov Decision Processes.

P Dai, J Goldsmith - IJCAI, 2007 - academia.edu

Value Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it
puts the majority of its effort into backing up the entire state space, which turns out to be …

被引用次数：89 相关文章所有 10 个版本

[PDF] psu.edu

Model-based function approximation in reinforcement learning

NK Jong, P Stone - Proceedings of the 6th international joint conference …, 2007 - dl.acm.org

Reinforcement learning promises a generic method for adapting agents to arbitrary tasks in
arbitrary stochastic environments, but applying it to new real-world problems remains …

被引用次数：80 相关文章所有 8 个版本

[PDF] psu.edu

[PDF][PDF] Large scale reinforcement learning using q-sarsa (λ) and cascading neural networks

S Nissen - Unpublished masters thesis, Department of Computer …, 2007 - Citeseer

This thesis explores how the novel model-free reinforcement learning algorithm Q-SARSA
(λ) can be combined with the constructive neural network training algorithm Cascade 2, and …

被引用次数：54 相关文章

[PDF] cmu.edu

Oracular partially observable markov decision processes: A very special case

N Armstrong-Crews, M Veloso - Proceedings 2007 IEEE …, 2007 - ieeexplore.ieee.org

We introduce the oracular partially observable Markov decision process (OPOMDP), a type
of POMDP in which the world produces no observations; instead there is an" oracle," …

被引用次数：51 相关文章所有 9 个版本

[PDF] psu.edu

Model-based exploration in continuous state spaces

NK Jong, P Stone - International Symposium on Abstraction, Reformulation …, 2007 - Springer

Modern reinforcement learning algorithms effectively exploit experience data sampled from
an unknown controlled dynamical system to compute a good control policy, but to obtain the …

被引用次数：54 相关文章所有 14 个版本

[PDF] rutgers.edu

Probably approximately correct (PAC) exploration in reinforcement learning

AL Strehl - 2007 - rucore.libraries.rutgers.edu

In this thesis, we consider some fundamental problems in the field of Reinforcement
Learning (Sutton & Barto, 1998). In particular, our focus is on the problem of exploration …

被引用次数：30 相关文章所有 11 个版本

[PDF] toronto.edu

[图书][B] A Bayesian approach to multiagent reinforcement learning and coalition formation under uncertainty

G Chalkiadakis - 2007 - cs.toronto.edu

Sequential decision making under uncertainty is always a challenge for autonomous agents
populating a multiagent environment, since their behaviour is inevitably influenced by the …

被引用次数：27 相关文章所有 6 个版本

[PDF] psu.edu

A cognitive model of imitative development in humans and machines

AP Shon, JJ Storz, AN Meltzoff… - International Journal of …, 2007 - World Scientific

Several algorithms and models have recently been proposed for imitation learning in
humans and robots. However, few proposals offer a framework for imitation learning in noisy …

被引用次数：26 相关文章所有 6 个版本

Hierarchical markov decision processes based distributed data fusion and collaborative sensor management for multitarget multisensor tracking applications

D Akselrod, A Sinha… - 2007 IEEE International …, 2007 - ieeexplore.ieee.org

This paper presents a decision mechanism based on hierarchical Markov decision
processes as a solution for two important problems in multitarget multisensor tracking …

被引用次数：22 相关文章所有 2 个版本

高级搜索

QQ 群