A review on deep learning techniques for video prediction

S Oprea, P Martinez-Gonzalez… - … on Pattern Analysis …, 2020 - ieeexplore.ieee.org
The ability to predict, anticipate and reason about future outcomes is a key component of
intelligent decision-making systems. In light of the success of deep learning in computer …

Interdiff: Generating 3d human-object interactions with physics-informed diffusion

S Xu, Z Li, YX Wang, LY Gui - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most
existing research on HOI synthesis lacks comprehensive whole-body interactions with …

Disentangling physical dynamics from unknown factors for unsupervised video prediction

VL Guen, N Thome - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
Leveraging physical knowledge described by partial differential equations (PDEs) is an
appealing way to improve unsupervised video forecasting models. Since physics is too …

SINC: Spatial composition of 3D human motions for simultaneous action generation

N Athanasiou, M Petrovich… - Proceedings of the …, 2023 - openaccess.thecvf.com
Our goal is to synthesize 3D human motions given textual inputs describing simultaneous
actions, for examplewaving hand'whilewalking'at the same time. We refer to generating such …

Learning multi-object dynamics with compositional neural radiance fields

D Driess, Z Huang, Y Li, R Tedrake… - Conference on robot …, 2023 - proceedings.mlr.press
We present a method to learn compositional multi-object dynamics models from image
observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and …

Joint hand motion and interaction hotspots prediction from egocentric videos

S Liu, S Tripathi, S Majumdar… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
We propose to forecast future hand-object interactions given an egocentric video. Instead of
predicting action labels or pixels, we directly predict the hand motion trajectory and the …

Infinitenature-zero: Learning perpetual view generation of natural scenes from single images

Z Li, Q Wang, N Snavely, A Kanazawa - European Conference on …, 2022 - Springer
We present a method for learning to generate unbounded flythrough videos of natural
scenes starting from a single view. This capability is learned from a collection of single …

Slotformer: Unsupervised visual dynamics simulation with object-centric models

Z Wu, N Dvornik, K Greff, T Kipf, A Garg - arXiv preprint arXiv:2210.05861, 2022 - arxiv.org
Understanding dynamics from visual observations is a challenging problem that requires
disentangling individual objects from the scene and learning their interactions. While recent …

Greedy hierarchical variational autoencoders for large-scale video prediction

B Wu, S Nair, R Martin-Martin… - Proceedings of the …, 2021 - openaccess.thecvf.com
A video prediction model that generalizes to diverse scenes would enable intelligent agents
such as robots to perform a variety of tasks via planning with the model. However, while …

Video prediction recalling long-term motion context via memory alignment learning

S Lee, HG Kim, DH Choi, HI Kim… - Proceedings of the …, 2021 - openaccess.thecvf.com
Our work addresses long-term motion context issues for predicting future frames. To predict
the future precisely, it is required to capture which long-term motion context (eg, walking or …