WB Powell - European Journal of Operational Research, 2019 - Elsevier
Stochastic optimization is an umbrella term that includes over a dozen fragmented communities, using a patchwork of sometimes overlapping notational systems with …
Noisy sensing, imperfect control, and environment changes are defining characteristics of many real-world robot tasks. The partially observable Markov decision process (POMDP) …
In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a …
Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions …
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …
Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $ N $ conventional RL agents, trained on $ N …
This is the first rigorous, self-contained treatment of the theory of deep learning. Starting with the foundations of the theory and building it up, this is essential reading for any scientists …
ZD Guo, BA Pires, B Piot, JB Grill… - International …, 2020 - proceedings.mlr.press
Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable …
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical …