A survey of multi-objective sequential decision-making

DM Roijers, P Vamplew, S Whiteson… - Journal of Artificial …, 2013 - jair.org
Sequential decision-making problems with multiple objectives arise naturally in practice and
pose unique challenges for research in decision-theoretic planning and learning, which has …

Partially observable markov decision processes in robotics: A survey

M Lauri, D Hsu, J Pajarinen - IEEE Transactions on Robotics, 2022 - ieeexplore.ieee.org
Noisy sensing, imperfect control, and environment changes are defining characteristics of
many real-world robot tasks. The partially observable Markov decision process (POMDP) …

Monte-Carlo planning in large POMDPs

D Silver, J Veness - Advances in neural information …, 2010 - proceedings.neurips.cc
This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The
algorithm combines a Monte-Carlo update of the agent's belief state with a Monte-Carlo tree …

A survey of point-based POMDP solvers

G Shani, J Pineau, R Kaplow - Autonomous Agents and Multi-Agent …, 2013 - Springer
The past decade has seen a significant breakthrough in research on solving partially
observable Markov decision processes (POMDPs). Where past solvers could not scale …

[图书][B] Reinforcement learning and dynamic programming using function approximators

L Busoniu, R Babuska, B De Schutter, D Ernst - 2017 - taylorfrancis.com
From household appliances to applications in robotics, engineered systems involving
complex dynamics can only be as effective as the algorithms that control them. While …

Adaptive submodularity: Theory and applications in active learning and stochastic optimization

D Golovin, A Krause - Journal of Artificial Intelligence Research, 2011 - jair.org
Many problems in artificial intelligence require adaptively making a sequence of decisions
with uncertain outcomes under partial observability. Solving such stochastic optimization …

Autonomous driving in urban environments: approaches, lessons and challenges

M Campbell, M Egerstedt, JP How… - … Transactions of the …, 2010 - royalsocietypublishing.org
The development of autonomous vehicles for urban driving has seen rapid progress in the
past 30 years. This paper provides a summary of the current state of the art in autonomous …

Online planning algorithms for POMDPs

S Ross, J Pineau, S Paquet, B Chaib-Draa - Journal of Artificial Intelligence …, 2008 - jair.org
Abstract Partially Observable Markov Decision Processes (POMDPs) provide a rich
framework for sequential decision-making under uncertainty in stochastic domains …

Rl for latent mdps: Regret guarantees and a lower bound

J Kwon, Y Efroni, C Caramanis… - Advances in Neural …, 2021 - proceedings.neurips.cc
In this work, we consider the regret minimization problem for reinforcement learning in latent
Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of …

From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning

R Munos - Foundations and Trends® in Machine Learning, 2014 - nowpublishers.com
This work covers several aspects of the optimism in the face of uncertainty principle applied
to large scale optimization problems under finite numerical budget. The initial motivation for …