A disentangled recognition and nonlinear dynamics model for unsupervised learning

M Fraccaro, S Kamronn, U Paquet… - Advances in neural …, 2017 - proceedings.neurips.cc
This paper takes a step towards temporal reasoning in a dynamically changing video, not in
the pixel space that constitutes its frames, but in a latent space that describes the non-linear …

Recurrent environment simulators

S Chiappa, S Racaniere, D Wierstra… - arXiv preprint arXiv …, 2017 - arxiv.org
Models that can simulate how environments change in response to actions can be used by
agents to plan and act efficiently. We improve on previous environment simulators from high …

Provably efficient reinforcement learning in partially observable dynamical systems

M Uehara, A Sekhari, JD Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract We study Reinforcement Learning for partially observable systems using function
approximation. We propose a new PO-bilinear framework, that is general enough to include …

Pac reinforcement learning for predictive state representations

W Zhan, M Uehara, W Sun, JD Lee - arXiv preprint arXiv:2207.05738, 2022 - arxiv.org
In this paper we study online Reinforcement Learning (RL) in partially observable dynamical
systems. We focus on the Predictive State Representations (PSRs) model, which is an …

First-person activity forecasting with online inverse reinforcement learning

N Rhinehart, KM Kitani - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
We address the problem of incrementally modeling and forecasting long-term goals of a first-
person camera wearer: what the user will do, where they will go, and what goal they seek. In …

Provably efficient imitation learning from observation alone

W Sun, A Vemula, B Boots… - … conference on machine …, 2019 - proceedings.mlr.press
Abstract We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale
MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in …

Truncated horizon policy search: Combining reinforcement learning & imitation learning

W Sun, JA Bagnell, B Boots - arXiv preprint arXiv:1805.11240, 2018 - arxiv.org
In this paper, we propose to combine imitation and reinforcement learning via the idea of
reward shaping using an oracle. We study the effectiveness of the near-optimal cost-to-go …

Dual policy iteration

W Sun, GJ Gordon, B Boots… - Advances in Neural …, 2018 - proceedings.neurips.cc
Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated
impressive practical performance (eg, ExIt from [1], AlphaGo-Zero from [2]). This new family …

Embed to control partially observed systems: Representation learning with provable sample efficiency

L Wang, Q Cai, Z Yang, Z Wang - arXiv preprint arXiv:2205.13476, 2022 - arxiv.org
Reinforcement learning in partially observed Markov decision processes (POMDPs) faces
two challenges.(i) It often takes the full history to predict the future, which induces a sample …

Pves: Position-velocity encoders for unsupervised learning of structured state representations

R Jonschkowski, R Hafner, J Scholz… - arXiv preprint arXiv …, 2017 - arxiv.org
We propose position-velocity encoders (PVEs) which learn---without supervision---to
encode images to positions and velocities of task-relevant objects. PVEs encode a single …