Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

Rerender a video: Zero-shot text-guided video-to-video translation

S Yang, Y Zhou, Z Liu, CC Loy - SIGGRAPH Asia 2023 Conference …, 2023 - dl.acm.org
Large text-to-image diffusion models have exhibited impressive proficiency in generating
high-quality images. However, when applying these models to video domain, ensuring …

Unifying flow, stereo and depth estimation

H Xu, J Zhang, J Cai, H Rezatofighi… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
We present a unified formulation and model for three motion and 3D perception tasks:
optical flow, rectified stereo matching and unrectified stereo depth estimation from posed …

Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation

X Shi, Z Huang, D Li, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
FlowFormer introduces a transformer architecture into optical flow estimation and achieves
state-of-the-art performance. The core component of FlowFormer is the transformer-based …

Tapir: Tracking any point with per-frame initialization and temporal refinement

C Doersch, Y Yang, M Vecerik… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried
point on any physical surface throughout a video sequence. Our approach employs two …

Nicer-slam: Neural implicit scene encoding for rgb slam

Z Zhu, S Peng, V Larsson, Z Cui… - … Conference on 3D …, 2024 - ieeexplore.ieee.org
Neural implicit representations have recently become popular in simultaneous localization
and mapping (SLAM), especially in dense visual SLAM. However, existing works either rely …

A dynamic multi-scale voxel flow network for video prediction

X Hu, Z Huang, A Huang, J Xu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
The performance of video prediction has been greatly boosted by advanced deep neural
networks. However, most of the current methods suffer from large model sizes and require …

Videoflow: Exploiting temporal cues for multi-frame optical flow estimation

X Shi, Z Huang, W Bian, D Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
We introduce VideoFlow, a novel optical flow estimation framework for videos. In contrast to
previous methods that learn to estimate optical flow from two frames, VideoFlow concurrently …

Amt: All-pairs multi-field transforms for efficient frame interpolation

Z Li, ZL Zhu, LH Han, Q Hou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for
video frame interpolation. It is based on two essential designs. First, we build bidirectional …

The surprising effectiveness of diffusion models for optical flow and monocular depth estimation

S Saxena, C Herrmann, J Hur, A Kar… - Advances in …, 2024 - proceedings.neurips.cc
Denoising diffusion probabilistic models have transformed image generation with their
impressive fidelity and diversity. We show that they also excel in estimating optical flow and …