- 学术资源搜索

Vision transformers for dense prediction: A survey

S Zuo, Y Xiao, X Chang, X Wang - Knowledge-Based Systems, 2022 - Elsevier

Transformers have demonstrated impressive expressiveness and transfer capability in
computer vision fields. Dense prediction is a fundamental problem in computer vision that is …

被引用次数：34 相关文章所有 3 个版本

Recent advances in vision transformer: A survey and outlook of recent work

K Islam - arXiv preprint arXiv:2203.01536, 2022 - arxiv.org

Vision Transformers (ViTs) are becoming more popular and dominating technique for
various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding …

被引用次数：44 相关文章所有 2 个版本

[PDF] thecvf.com

Mhformer: Multi-hypothesis transformer for 3d human pose estimation

W Li, H Liu, H Tang, P Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Estimating 3D human poses from monocular videos is a challenging task due to depth
ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting …

被引用次数：264 相关文章所有 10 个版本

[PDF] thecvf.com

Repurposing diffusion-based image generators for monocular depth estimation

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024 - openaccess.thecvf.com

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …

被引用次数：50 相关文章所有 3 个版本

[PDF] thecvf.com

idisc: Internal discretization for monocular depth estimation

L Piccinelli, C Sakaridis, F Yu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Monocular depth estimation is fundamental for 3D scene understanding and downstream
applications. However, even under the supervised setup, it is still challenging and ill-posed …

被引用次数：61 相关文章所有 10 个版本

[PDF] thecvf.com

Attention attention everywhere: Monocular depth prediction with skip attention

A Agarwal, C Arora - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com

Abstract Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given a single
RGB image. For both, the convolutional as well as the recent attention-based models …

被引用次数：119 相关文章所有 5 个版本

[PDF] thecvf.com

Metric3d: Towards zero-shot metric 3d prediction from a single image

W Yin, C Zhang, H Chen, Z Cai, G Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-
posedness of the single-image reconstruction problem, most well-established methods are …

被引用次数：43 相关文章所有 6 个版本

[PDF] springer.com

Depthformer: Exploiting long-range correlation and local information for accurate monocular depth estimation

Z Li, Z Chen, X Liu, J Jiang - Machine Intelligence Research, 2023 - Springer

This paper aims to address the problem of supervised monocular depth estimation. We start
with a meticulous pilot study to demonstrate that the long-range correlation is essential for …

被引用次数：144 相关文章所有 7 个版本

[PDF] arxiv.org

Monovit: Self-supervised monocular depth estimation with a vision transformer

C Zhao, Y Zhang, M Poggi, F Tosi… - … conference on 3D …, 2022 - ieeexplore.ieee.org

Self-supervised monocular depth estimation is an attractive solution that does not require
hard-to-source depth la-bels for training. Convolutional neural networks (CNNs) have …

被引用次数：139 相关文章所有 6 个版本

[PDF] thecvf.com

Ddp: Diffusion model for dense visual prediction

Y Ji, Z Chen, E Xie, L Hong, X Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose a simple, efficient, yet powerful framework for dense visual predictions based
on the conditional diffusion pipeline. Our approach follows a" noise-to-map" generative …

被引用次数：56 相关文章所有 6 个版本

高级搜索

QQ 群

Vision transformers for dense prediction: A survey

Recent advances in vision transformer: A survey and outlook of recent work

Mhformer: Multi-hypothesis transformer for 3d human pose estimation

Repurposing diffusion-based image generators for monocular depth estimation

idisc: Internal discretization for monocular depth estimation

Attention attention everywhere: Monocular depth prediction with skip attention

Metric3d: Towards zero-shot metric 3d prediction from a single image

Depthformer: Exploiting long-range correlation and local information for accurate monocular depth estimation

Monovit: Self-supervised monocular depth estimation with a vision transformer

Ddp: Diffusion model for dense visual prediction

引用