Unifying flow, stereo and depth estimation

H Xu, J Zhang, J Cai, H Rezatofighi… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
We present a unified formulation and model for three motion and 3D perception tasks:
optical flow, rectified stereo matching and unrectified stereo depth estimation from posed …

Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation

X Shi, Z Huang, D Li, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
FlowFormer introduces a transformer architecture into optical flow estimation and achieves
state-of-the-art performance. The core component of FlowFormer is the transformer-based …

Infinite photorealistic worlds using procedural generation

A Raistrick, L Lipson, Z Ma, L Mei… - Proceedings of the …, 2023 - openaccess.thecvf.com
We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural
world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from …

CroCo v2: Improved cross-view completion pre-training for stereo matching and optical flow

P Weinzaepfel, T Lucas, V Leroy… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite impressive performance for high-level downstream tasks, self-supervised pre-
training methods have not yet fully delivered on dense geometric vision tasks such as stereo …

Gaflow: Incorporating gaussian attention into optical flow

A Luo, F Yang, X Li, L Nie, C Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Optical flow, or the estimation of motion fields from image sequences, is one of the
fundamental problems in computer vision. Unlike most pixel-wise tasks that aim at achieving …

Deep equilibrium approaches to diffusion models

A Pokle, Z Geng, JZ Kolter - Advances in Neural …, 2022 - proceedings.neurips.cc
Diffusion-based generative models are extremely effective in generating high-quality
images, with generated samples often surpassing the quality of those produced by other …

Mc-jepa: A joint-embedding predictive architecture for self-supervised learning of motion and content features

A Bardes, J Ponce, Y LeCun - arXiv preprint arXiv:2307.12698, 2023 - arxiv.org
Self-supervised learning of visual representations has been focusing on learning content
features, which do not capture object motion or location, and focus on identifying and …

Exploiting connections between Lipschitz structures for certifiably robust deep equilibrium models

A Havens, A Araujo, S Garg… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recently, deep equilibrium models (DEQs) have drawn increasing attention from the
machine learning community. However, DEQs are much less understood in terms of certified …

Recurrence without recurrence: Stable video landmark detection with deep equilibrium models

P Micaelli, A Vahdat, H Yin, J Kautz… - Proceedings of the …, 2023 - openaccess.thecvf.com
Cascaded computation, whereby predictions are recurrently refined over several stages, has
been a persistent theme throughout the development of landmark detection models. In this …

Deep equilibrium object detection

S Wang, Y Teng, L Wang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Query-based object detectors directly decode image features into object instances with a set
of learnable queries. These query vectors are progressively refined to stable meaningful …