GLOBER: coherent non-autoregressive video generation via global guided video decoder

M Sun, W Wang, Z Qin, J Sun… - Advances in Neural …, 2024 - proceedings.neurips.cc
Video generation necessitates both global coherence and local realism. This work presents
a novel non-autoregressive method GLOBER, which first generates global features to obtain …

A Survey on Video Prediction: From Deterministic to Generative Approaches

R Ming, Z Huang, Z Ju, J Hu, L Peng, S Zhou - arXiv preprint arXiv …, 2024 - arxiv.org
Video prediction, a fundamental task in computer vision, aims to enable models to generate
sequences of future frames based on existing video content. This task has garnered …

Reparameterizing and dynamically quantizing image features for image generation

M Sun, W Wang, X Zhu, J Liu - Pattern Recognition, 2024 - Elsevier
For autoregressive image generation, vector-quantized VAEs (VQ-VAEs) quantize image
features with discrete codebook entries and reconstruct images from quantized features …

STDiff: Spatio-Temporal Diffusion for Continuous Stochastic Video Prediction

X Ye, GA Bilodeau - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Predicting future frames of a video is challenging because it is difficult to learn the
uncertainty of the underlying factors influencing their contents. In this paper, we propose a …

TinyPredNet: a lightweight framework for satellite image sequence prediction

K Dai, X Li, H Lin, Y Jiang, X Chen, Y Ye… - ACM Transactions on …, 2024 - dl.acm.org
Satellite image sequence prediction aims to precisely infer future satellite image frames with
historical observations, which is a significant and challenging dense prediction task. Though …

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation

X Wang, X Li, Y Hu, H Zhu, C Hou, C Lan… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-driven Image to Video Generation (TI2V) aims to generate controllable video given the
first frame and corresponding textual description. The primary challenges of this task lie in …

Motion Graph Unleashed: A Novel Approach to Video Prediction

Y Zhong, L Liang, B Tang, I Zharkov… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce motion graph, a novel approach to the video prediction problem, which
predicts future video frames from limited past data. The motion graph transforms patches of …

A spatiotemporal motion prediction network based on multi-level feature disentanglement

S Chen, Y Bo, X Wu - Image and Vision Computing, 2024 - Elsevier
The prediction task is significantly challenged by the intricate scene information and motion
variations present in spatiotemporal data. Existing prediction methods struggle to accurately …

Benchmarking Multi-Modal LLMs for Testing Visual Deep Learning Systems Through the Lens of Image Mutation

L Wang, Y Yuan, A Sun, Z Li, P Ma, D Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Visual deep learning (VDL) systems have shown significant success in real-world
applications like image recognition, object detection, and autonomous driving. To evaluate …

COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation

M Sun, W Wang, X Zhu, J Liu - arXiv preprint arXiv:2410.01718, 2024 - arxiv.org
Since videos record objects moving coherently, adjacent video frames have commonness
(similar object appearances) and uniqueness (slightly changed postures). To prevent …