Models that can simulate how environments change in response to actions can be used by agents to plan and act efficiently. We improve on previous environment simulators from high …
Abstract We study Reinforcement Learning for partially observable systems using function approximation. We propose a new PO-bilinear framework, that is general enough to include …
In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an …
N Rhinehart, KM Kitani - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
We address the problem of incrementally modeling and forecasting long-term goals of a first- person camera wearer: what the user will do, where they will go, and what goal they seek. In …
W Sun, A Vemula, B Boots… - … conference on machine …, 2019 - proceedings.mlr.press
Abstract We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in …
In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle. We study the effectiveness of the near-optimal cost-to-go …
Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (eg, ExIt from [1], AlphaGo-Zero from [2]). This new family …
L Wang, Q Cai, Z Yang, Z Wang - arXiv preprint arXiv:2205.13476, 2022 - arxiv.org
Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges.(i) It often takes the full history to predict the future, which induces a sample …
We propose position-velocity encoders (PVEs) which learn---without supervision---to encode images to positions and velocities of task-relevant objects. PVEs encode a single …