K Islam - arXiv preprint arXiv:2203.01536, 2022 - arxiv.org
Vision Transformers (ViTs) are becoming more popular and dominating technique for various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding …
Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting …
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding so it is not …
Monocular depth estimation is fundamental for 3D scene understanding and downstream applications. However, even under the supervised setup, it is still challenging and ill-posed …
A Agarwal, C Arora - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com
Abstract Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given a single RGB image. For both, the convolutional as well as the recent attention-based models …
W Yin, C Zhang, H Chen, Z Cai, G Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill- posedness of the single-image reconstruction problem, most well-established methods are …
Z Li, Z Chen, X Liu, J Jiang - Machine Intelligence Research, 2023 - Springer
This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for …
Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth la-bels for training. Convolutional neural networks (CNNs) have …
We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a" noise-to-map" generative …