Acting optimally in partially observable stochastic domains

World models and predictive coding for cognitive and developmental robotics: Frontiers and challenges

T Taniguchi, S Murata, M Suzuki, D Ognibene… - Advanced …, 2023 - Taylor & Francis

Creating autonomous robots that can actively explore the environment, acquire knowledge
and learn skills continuously is the ultimate achievement envisioned in cognitive and …

被引用次数：47 相关文章所有 14 个版本

A unified framework for stochastic optimization

WB Powell - European Journal of Operational Research, 2019 - Elsevier

Stochastic optimization is an umbrella term that includes over a dozen fragmented
communities, using a patchwork of sometimes overlapping notational systems with …

被引用次数：297 相关文章所有 4 个版本

[PDF] arxiv.org

Partially observable markov decision processes in robotics: A survey

M Lauri, D Hsu, J Pajarinen - IEEE Transactions on Robotics, 2022 - ieeexplore.ieee.org

Noisy sensing, imperfect control, and environment changes are defining characteristics of
many real-world robot tasks. The partially observable Markov decision process (POMDP) …

被引用次数：86 相关文章所有 7 个版本

[PDF] neurips.cc

A definition of continual reinforcement learning

D Abel, A Barreto, B Van Roy… - Advances in …, 2024 - proceedings.neurips.cc

In a standard view of the reinforcement learning problem, an agent's goal is to efficiently
identify a policy that maximizes long-term reward. However, this perspective is based on a …

被引用次数：36 相关文章所有 8 个版本

[PDF] arxiv.org

Varibad: A very good method for bayes-adaptive deep rl via meta-learning

L Zintgraf, K Shiarlis, M Igl, S Schulze, Y Gal… - arXiv preprint arXiv …, 2019 - arxiv.org

Trading off exploration and exploitation in an unknown environment is key to maximising
expected return during learning. A Bayes-optimal policy, which does so optimally, conditions …

被引用次数：262 相关文章所有 6 个版本

[PDF] arxiv.org

Recurrent model-free rl can be a strong baseline for many pomdps

T Ni, B Eysenbach, R Salakhutdinov - arXiv preprint arXiv:2110.05038, 2021 - arxiv.org

Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …

被引用次数：84 相关文章所有 4 个版本

[PDF] neurips.cc

Offline Meta Reinforcement Learning--Identifiability Challenges and Effective Data Collection Strategies

R Dorfman, I Shenfeld, A Tamar - Advances in Neural …, 2021 - proceedings.neurips.cc

Consider the following instance of the Offline Meta Reinforcement Learning (OMRL)
problem: given the complete training logs of $ N $ conventional RL agents, trained on $ N …

被引用次数：61 相关文章所有 5 个版本

[PDF] uci.edu

[图书][B] Deep learning in science

P Baldi - 2021 - books.google.com

This is the first rigorous, self-contained treatment of the theory of deep learning. Starting with
the foundations of the theory and building it up, this is essential reading for any scientists …

被引用次数：104 相关文章所有 9 个版本

[PDF] mlr.press

Bootstrap latent-predictive representations for multitask reinforcement learning

ZD Guo, BA Pires, B Piot, JB Grill… - International …, 2020 - proceedings.mlr.press

Learning a good representation is an essential component for deep reinforcement learning
(RL). Representation learning is especially important in multitask and partially observable …

被引用次数：147 相关文章所有 7 个版本

[PDF] jair.org

Reinforcement learning: A survey

LP Kaelbling, ML Littman, AW Moore - Journal of artificial intelligence …, 1996 - jair.org

This paper surveys the field of reinforcement learning from a computer-science perspective.
It is written to be accessible to researchers familiar with machine learning. Both the historical …

被引用次数：11834 相关文章所有 77 个版本

高级搜索

QQ 群