Zoedepth: Zero-shot transfer by combining relative and metric depth

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

被引用次数：161 相关文章所有 6 个版本

[PDF] thecvf.com

Unleashing text-to-image diffusion models for visual perception

W Zhao, Y Rao, Z Liu, B Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Diffusion models (DMs) have become the new trend of generative models and have
demonstrated a powerful ability of conditional synthesis. Among those, text-to-image …

被引用次数：107 相关文章所有 5 个版本

[PDF] thecvf.com

Spatialvlm: Endowing vision-language models with spatial reasoning capabilities

B Chen, Z Xu, S Kirmani, B Ichter… - Proceedings of the …, 2024 - openaccess.thecvf.com

Understanding and reasoning about spatial relationships is crucial for Visual Question
Answering (VQA) and robotics. Vision Language Models (VLMs) have shown impressive …

被引用次数：41 相关文章所有 5 个版本

[PDF] thecvf.com

Depth-regularized optimization for 3d gaussian splatting in few-shot images

J Chung, J Oh, KM Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

This paper presents a method to optimize Gaussian splatting with a limited number of
images while avoiding overfitting. Representing a 3D scene by combining numerous …

被引用次数：25 相关文章所有 3 个版本

[PDF] thecvf.com

Towards zero-shot scale-aware monocular depth estimation

V Guizilini, I Vasiljevic, D Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to
produce metric predictions. Even so, the resulting models will be geometry-specific, with …

被引用次数：30 相关文章所有 5 个版本

[PDF] thecvf.com

Probing the 3d awareness of visual foundation models

M El Banani, A Raj, KK Maninis, A Kar… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advances in large-scale pretraining have yielded visual foundation models with
strong capabilities. Not only can recent models generalize to arbitrary images for their …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

Luciddreamer: Domain-free generation of 3d gaussian splatting scenes

J Chung, S Lee, H Nam, J Lee, KM Lee - arXiv preprint arXiv:2311.13384, 2023 - arxiv.org

With the widespread usage of VR devices and contents, demands for 3D scene generation
techniques become more popular. Existing 3D scene generation models, however, limit the …

被引用次数：41 相关文章所有 2 个版本

[PDF] thecvf.com

NTIRE 2024 challenge on HR depth from images of specular and transparent surfaces

PZ Ramirez, F Tosi, L Di Stefano… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper reports on the NTIRE 2024 challenge on HR Depth From images of Specular and
Transparent surfaces held in conjunction with the New Trends in Image Restoration and …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

Zeronvs: Zero-shot 360-degree view synthesis from a single real image

K Sargent, Z Li, T Shah, C Herrmann, HX Yu… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis
for in-the-wild scenes. While existing methods are designed for single objects with masked …

被引用次数：30 相关文章所有 3 个版本

[PDF] neurips.cc

Stylegan knows normal, depth, albedo, and more

A Bhattad, D McKee, D Hoiem… - Advances in Neural …, 2024 - proceedings.neurips.cc

Intrinsic images, in the original sense, are image-like maps of scene properties like depth,
normal, albedo, or shading. This paper demonstrates that StyleGAN can easily be induced …

被引用次数：19 相关文章所有 6 个版本

高级搜索

QQ 群