Vision transformers for dense prediction: A survey

S Zuo, Y Xiao, X Chang, X Wang - Knowledge-Based Systems, 2022 - Elsevier
Transformers have demonstrated impressive expressiveness and transfer capability in
computer vision fields. Dense prediction is a fundamental problem in computer vision that is …

Recent advances in vision transformer: A survey and outlook of recent work

K Islam - arXiv preprint arXiv:2203.01536, 2022 - arxiv.org
Vision Transformers (ViTs) are becoming more popular and dominating technique for
various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding …

Mhformer: Multi-hypothesis transformer for 3d human pose estimation

W Li, H Liu, H Tang, P Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Estimating 3D human poses from monocular videos is a challenging task due to depth
ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting …

Repurposing diffusion-based image generators for monocular depth estimation

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024 - openaccess.thecvf.com
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …

idisc: Internal discretization for monocular depth estimation

L Piccinelli, C Sakaridis, F Yu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Monocular depth estimation is fundamental for 3D scene understanding and downstream
applications. However, even under the supervised setup, it is still challenging and ill-posed …

Attention attention everywhere: Monocular depth prediction with skip attention

A Agarwal, C Arora - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com
Abstract Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given a single
RGB image. For both, the convolutional as well as the recent attention-based models …

Metric3d: Towards zero-shot metric 3d prediction from a single image

W Yin, C Zhang, H Chen, Z Cai, G Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-
posedness of the single-image reconstruction problem, most well-established methods are …

Depthformer: Exploiting long-range correlation and local information for accurate monocular depth estimation

Z Li, Z Chen, X Liu, J Jiang - Machine Intelligence Research, 2023 - Springer
This paper aims to address the problem of supervised monocular depth estimation. We start
with a meticulous pilot study to demonstrate that the long-range correlation is essential for …

Monovit: Self-supervised monocular depth estimation with a vision transformer

C Zhao, Y Zhang, M Poggi, F Tosi… - … conference on 3D …, 2022 - ieeexplore.ieee.org
Self-supervised monocular depth estimation is an attractive solution that does not require
hard-to-source depth la-bels for training. Convolutional neural networks (CNNs) have …

Ddp: Diffusion model for dense visual prediction

Y Ji, Z Chen, E Xie, L Hong, X Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose a simple, efficient, yet powerful framework for dense visual predictions based
on the conditional diffusion pipeline. Our approach follows a" noise-to-map" generative …