Compositional video prediction

S Oprea, P Martinez-Gonzalez… - … on Pattern Analysis …, 2020 - ieeexplore.ieee.org

The ability to predict, anticipate and reason about future outcomes is a key component of
intelligent decision-making systems. In light of the success of deep learning in computer …

被引用次数：262 相关文章所有 14 个版本

[PDF] thecvf.com

Interdiff: Generating 3d human-object interactions with physics-informed diffusion

S Xu, Z Li, YX Wang, LY Gui - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most
existing research on HOI synthesis lacks comprehensive whole-body interactions with …

被引用次数：51 相关文章所有 6 个版本

[PDF] thecvf.com

Disentangling physical dynamics from unknown factors for unsupervised video prediction

VL Guen, N Thome - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com

Leveraging physical knowledge described by partial differential equations (PDEs) is an
appealing way to improve unsupervised video forecasting models. Since physics is too …

被引用次数：277 相关文章所有 13 个版本

[PDF] thecvf.com

SINC: Spatial composition of 3D human motions for simultaneous action generation

N Athanasiou, M Petrovich… - Proceedings of the …, 2023 - openaccess.thecvf.com

Our goal is to synthesize 3D human motions given textual inputs describing simultaneous
actions, for examplewaving hand'whilewalking'at the same time. We refer to generating such …

被引用次数：28 相关文章所有 14 个版本

[PDF] mlr.press

Learning multi-object dynamics with compositional neural radiance fields

D Driess, Z Huang, Y Li, R Tedrake… - Conference on robot …, 2023 - proceedings.mlr.press

We present a method to learn compositional multi-object dynamics models from image
observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and …

被引用次数：67 相关文章所有 9 个版本

[PDF] thecvf.com

Joint hand motion and interaction hotspots prediction from egocentric videos

S Liu, S Tripathi, S Majumdar… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We propose to forecast future hand-object interactions given an egocentric video. Instead of
predicting action labels or pixels, we directly predict the hand motion trajectory and the …

被引用次数：63 相关文章所有 5 个版本

[PDF] arxiv.org

Infinitenature-zero: Learning perpetual view generation of natural scenes from single images

Z Li, Q Wang, N Snavely, A Kanazawa - European Conference on …, 2022 - Springer

We present a method for learning to generate unbounded flythrough videos of natural
scenes starting from a single view. This capability is learned from a collection of single …

被引用次数：43 相关文章所有 8 个版本

[PDF] arxiv.org

Slotformer: Unsupervised visual dynamics simulation with object-centric models

Z Wu, N Dvornik, K Greff, T Kipf, A Garg - arXiv preprint arXiv:2210.05861, 2022 - arxiv.org

Understanding dynamics from visual observations is a challenging problem that requires
disentangling individual objects from the scene and learning their interactions. While recent …

被引用次数：57 相关文章所有 4 个版本

[PDF] thecvf.com

Greedy hierarchical variational autoencoders for large-scale video prediction

B Wu, S Nair, R Martin-Martin… - Proceedings of the …, 2021 - openaccess.thecvf.com

A video prediction model that generalizes to diverse scenes would enable intelligent agents
such as robots to perform a variety of tasks via planning with the model. However, while …

被引用次数：114 相关文章所有 5 个版本

[PDF] thecvf.com

Video prediction recalling long-term motion context via memory alignment learning

S Lee, HG Kim, DH Choi, HI Kim… - Proceedings of the …, 2021 - openaccess.thecvf.com

Our work addresses long-term motion context issues for predicting future frames. To predict
the future precisely, it is required to capture which long-term motion context (eg, walking or …

被引用次数：108 相关文章所有 10 个版本

高级搜索

QQ 群

A review on deep learning techniques for video prediction

Interdiff: Generating 3d human-object interactions with physics-informed diffusion

Disentangling physical dynamics from unknown factors for unsupervised video prediction

SINC: Spatial composition of 3D human motions for simultaneous action generation

Learning multi-object dynamics with compositional neural radiance fields

Joint hand motion and interaction hotspots prediction from egocentric videos

Infinitenature-zero: Learning perpetual view generation of natural scenes from single images

Slotformer: Unsupervised visual dynamics simulation with object-centric models

Greedy hierarchical variational autoencoders for large-scale video prediction

Video prediction recalling long-term motion context via memory alignment learning

引用