Serotonin and the evaluation of future rewards: theory, experiments, and possible neural mechanisms

N Schweighofer, SC Tanaka… - Annals of the New York …, 2007 - Wiley Online Library
The ability to select an action by considering both delays and amount of reward outcome is
critical for survival and well‐being of animals and humans. Previous animal experiments …

[PDF][PDF] Topological Value Iteration Algorithm for Markov Decision Processes.

P Dai, J Goldsmith - IJCAI, 2007 - academia.edu
Value Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it
puts the majority of its effort into backing up the entire state space, which turns out to be …

Model-based function approximation in reinforcement learning

NK Jong, P Stone - Proceedings of the 6th international joint conference …, 2007 - dl.acm.org
Reinforcement learning promises a generic method for adapting agents to arbitrary tasks in
arbitrary stochastic environments, but applying it to new real-world problems remains …

[PDF][PDF] Large scale reinforcement learning using q-sarsa (λ) and cascading neural networks

S Nissen - Unpublished masters thesis, Department of Computer …, 2007 - Citeseer
This thesis explores how the novel model-free reinforcement learning algorithm Q-SARSA
(λ) can be combined with the constructive neural network training algorithm Cascade 2, and …

Oracular partially observable markov decision processes: A very special case

N Armstrong-Crews, M Veloso - Proceedings 2007 IEEE …, 2007 - ieeexplore.ieee.org
We introduce the oracular partially observable Markov decision process (OPOMDP), a type
of POMDP in which the world produces no observations; instead there is an" oracle," …

Model-based exploration in continuous state spaces

NK Jong, P Stone - International Symposium on Abstraction, Reformulation …, 2007 - Springer
Modern reinforcement learning algorithms effectively exploit experience data sampled from
an unknown controlled dynamical system to compute a good control policy, but to obtain the …

Probably approximately correct (PAC) exploration in reinforcement learning

AL Strehl - 2007 - rucore.libraries.rutgers.edu
In this thesis, we consider some fundamental problems in the field of Reinforcement
Learning (Sutton & Barto, 1998). In particular, our focus is on the problem of exploration …

[图书][B] A Bayesian approach to multiagent reinforcement learning and coalition formation under uncertainty

G Chalkiadakis - 2007 - cs.toronto.edu
Sequential decision making under uncertainty is always a challenge for autonomous agents
populating a multiagent environment, since their behaviour is inevitably influenced by the …

A cognitive model of imitative development in humans and machines

AP Shon, JJ Storz, AN Meltzoff… - International Journal of …, 2007 - World Scientific
Several algorithms and models have recently been proposed for imitation learning in
humans and robots. However, few proposals offer a framework for imitation learning in noisy …

Hierarchical markov decision processes based distributed data fusion and collaborative sensor management for multitarget multisensor tracking applications

D Akselrod, A Sinha… - 2007 IEEE International …, 2007 - ieeexplore.ieee.org
This paper presents a decision mechanism based on hierarchical Markov decision
processes as a solution for two important problems in multitarget multisensor tracking …