Learning to filter with predictive state inference machines

M Fraccaro, S Kamronn, U Paquet… - Advances in neural …, 2017 - proceedings.neurips.cc

This paper takes a step towards temporal reasoning in a dynamically changing video, not in
the pixel space that constitutes its frames, but in a latent space that describes the non-linear …

被引用次数：369 相关文章所有 11 个版本

[PDF] arxiv.org

Recurrent environment simulators

S Chiappa, S Racaniere, D Wierstra… - arXiv preprint arXiv …, 2017 - arxiv.org

Models that can simulate how environments change in response to actions can be used by
agents to plan and act efficiently. We improve on previous environment simulators from high …

被引用次数：240 相关文章所有 3 个版本

[PDF] neurips.cc

Provably efficient reinforcement learning in partially observable dynamical systems

M Uehara, A Sekhari, JD Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract We study Reinforcement Learning for partially observable systems using function
approximation. We propose a new PO-bilinear framework, that is general enough to include …

被引用次数：38 相关文章所有 8 个版本

[PDF] arxiv.org

Pac reinforcement learning for predictive state representations

W Zhan, M Uehara, W Sun, JD Lee - arXiv preprint arXiv:2207.05738, 2022 - arxiv.org

In this paper we study online Reinforcement Learning (RL) in partially observable dynamical
systems. We focus on the Predictive State Representations (PSRs) model, which is an …

被引用次数：46 相关文章所有 3 个版本

[PDF] thecvf.com

First-person activity forecasting with online inverse reinforcement learning

N Rhinehart, KM Kitani - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

We address the problem of incrementally modeling and forecasting long-term goals of a first-
person camera wearer: what the user will do, where they will go, and what goal they seek. In …

被引用次数：158 相关文章所有 10 个版本

[PDF] mlr.press

Provably efficient imitation learning from observation alone

W Sun, A Vemula, B Boots… - … conference on machine …, 2019 - proceedings.mlr.press

Abstract We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale
MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in …

被引用次数：112 相关文章所有 6 个版本

[PDF] arxiv.org

Truncated horizon policy search: Combining reinforcement learning & imitation learning

W Sun, JA Bagnell, B Boots - arXiv preprint arXiv:1805.11240, 2018 - arxiv.org

In this paper, we propose to combine imitation and reinforcement learning via the idea of
reward shaping using an oracle. We study the effectiveness of the near-optimal cost-to-go …

被引用次数：107 相关文章所有 7 个版本

[PDF] neurips.cc

Dual policy iteration

W Sun, GJ Gordon, B Boots… - Advances in Neural …, 2018 - proceedings.neurips.cc

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated
impressive practical performance (eg, ExIt from [1], AlphaGo-Zero from [2]). This new family …

被引用次数：80 相关文章所有 8 个版本

[PDF] arxiv.org

Embed to control partially observed systems: Representation learning with provable sample efficiency

L Wang, Q Cai, Z Yang, Z Wang - arXiv preprint arXiv:2205.13476, 2022 - arxiv.org

Reinforcement learning in partially observed Markov decision processes (POMDPs) faces
two challenges.(i) It often takes the full history to predict the future, which induces a sample …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Pves: Position-velocity encoders for unsupervised learning of structured state representations

R Jonschkowski, R Hafner, J Scholz… - arXiv preprint arXiv …, 2017 - arxiv.org

We propose position-velocity encoders (PVEs) which learn---without supervision---to
encode images to positions and velocities of task-relevant objects. PVEs encode a single …

被引用次数：75 相关文章所有 2 个版本

高级搜索

QQ 群