How well do self-supervised models transfer?

L Ericsson, H Gouk… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Self-supervised visual representation learning has seen huge progress recently, but no
large scale evaluation has compared the many models now available. We evaluate the …

Enforcing geometric constraints of virtual normal for depth prediction

W Yin, Y Liu, C Shen, Y Yan - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Monocular depth prediction plays a crucial role in understanding 3D scene geometry.
Although recent methods have achieved impressive progress in evaluation metrics such as …

Probing the 3d awareness of visual foundation models

M El Banani, A Raj, KK Maninis, A Kar… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in large-scale pretraining have yielded visual foundation models with
strong capabilities. Not only can recent models generalize to arbitrary images for their …

Scaling and benchmarking self-supervised visual representation learning

P Goyal, D Mahajan, A Gupta… - Proceedings of the ieee …, 2019 - openaccess.thecvf.com
Self-supervised learning aims to learn representations from the data itself without explicit
manual supervision. Existing efforts ignore a crucial aspect of self-supervised learning-the …

Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos

V Casser, S Pirk, R Mahjourian, A Angelova - Proceedings of the AAAI …, 2019 - aaai.org
Learning to predict scene depth from RGB inputs is a challenging task both for indoor and
outdoor robot navigation. In this work we address unsupervised learning of scene depth and …

Pattern-affinitive propagation across depth, surface normal and semantic segmentation

Z Zhang, Z Cui, C Xu, Y Yan… - Proceedings of the …, 2019 - openaccess.thecvf.com
In this paper, we propose a novel Pattern-Affinitive Propagation (PAP) framework to jointly
predict depth, surface normal and semantic segmentation. The motivation behind it comes …

Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory

I Kokkinos - Proceedings of the IEEE conference on …, 2017 - openaccess.thecvf.com
In this work we train in an end-to-end manner a convolutional neural network (CNN) that
jointly handles low-, mid-, and high-level vision tasks in a unified architecture. Such a …

Geonet: Geometric neural network for joint depth and surface normal estimation

X Qi, R Liao, Z Liu, R Urtasun… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
In this paper, we propose Geometric Neural Network (GeoNet) to jointly predict depth and
surface normal maps from a single image. Building on top of two-stream CNNs, our GeoNet …

Conditional random fields as recurrent neural networks

S Zheng, S Jayasumana… - Proceedings of the …, 2015 - cv-foundation.org
Pixel-level labelling tasks, such as semantic segmentation, play a central role in image
understanding. Recent approaches have attempted to harness the capabilities of deep …

Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture

D Eigen, R Fergus - … of the IEEE international conference on …, 2015 - openaccess.thecvf.com
In this paper we address three different computer vision tasks using a single basic
architecture: depth prediction, surface normal estimation, and semantic labeling. We use a …