Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Unleashing text-to-image diffusion models for visual perception

W Zhao, Y Rao, Z Liu, B Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Diffusion models (DMs) have become the new trend of generative models and have
demonstrated a powerful ability of conditional synthesis. Among those, text-to-image …

Spatialvlm: Endowing vision-language models with spatial reasoning capabilities

B Chen, Z Xu, S Kirmani, B Ichter… - Proceedings of the …, 2024 - openaccess.thecvf.com
Understanding and reasoning about spatial relationships is crucial for Visual Question
Answering (VQA) and robotics. Vision Language Models (VLMs) have shown impressive …

Depth-regularized optimization for 3d gaussian splatting in few-shot images

J Chung, J Oh, KM Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
This paper presents a method to optimize Gaussian splatting with a limited number of
images while avoiding overfitting. Representing a 3D scene by combining numerous …

Towards zero-shot scale-aware monocular depth estimation

V Guizilini, I Vasiljevic, D Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to
produce metric predictions. Even so, the resulting models will be geometry-specific, with …

Probing the 3d awareness of visual foundation models

M El Banani, A Raj, KK Maninis, A Kar… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in large-scale pretraining have yielded visual foundation models with
strong capabilities. Not only can recent models generalize to arbitrary images for their …

Luciddreamer: Domain-free generation of 3d gaussian splatting scenes

J Chung, S Lee, H Nam, J Lee, KM Lee - arXiv preprint arXiv:2311.13384, 2023 - arxiv.org
With the widespread usage of VR devices and contents, demands for 3D scene generation
techniques become more popular. Existing 3D scene generation models, however, limit the …

NTIRE 2024 challenge on HR depth from images of specular and transparent surfaces

PZ Ramirez, F Tosi, L Di Stefano… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper reports on the NTIRE 2024 challenge on HR Depth From images of Specular and
Transparent surfaces held in conjunction with the New Trends in Image Restoration and …

Zeronvs: Zero-shot 360-degree view synthesis from a single real image

K Sargent, Z Li, T Shah, C Herrmann, HX Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis
for in-the-wild scenes. While existing methods are designed for single objects with masked …

Stylegan knows normal, depth, albedo, and more

A Bhattad, D McKee, D Hoiem… - Advances in Neural …, 2024 - proceedings.neurips.cc
Intrinsic images, in the original sense, are image-like maps of scene properties like depth,
normal, albedo, or shading. This paper demonstrates that StyleGAN can easily be induced …