Probing the 3d awareness of visual foundation models

M El Banani, A Raj, KK Maninis, A Kar… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in large-scale pretraining have yielded visual foundation models with
strong capabilities. Not only can recent models generalize to arbitrary images for their …

Hugs: Holistic urban 3d scene understanding via gaussian splatting

H Zhou, J Shao, L Xu, D Bai, W Qiu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Holistic understanding of urban scenes based on RGB images is a challenging yet important
problem. It encompasses understanding both the geometry and appearance to enable novel …

Polymax: General dense prediction with mask transformer

X Yang, L Yuan, K Wilber, A Sharma… - Proceedings of the …, 2024 - openaccess.thecvf.com
Dense prediction tasks, such as semantic segmentation, depth estimation, and surface
normal prediction, can be easily formulated as per-pixel classification (discrete outputs) or …

VP-Net: Voxels as points for 3-D object detection

Z Song, H Wei, C Jia, Y Xia, X Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The 3-D object detection with light detection and ranging (LiDAR) point clouds is a
challenging problem, which requires 3-D scene understanding, yet this task is critical to …

Patchfusion: An end-to-end tile-based framework for high-resolution monocular metric depth estimation

Z Li, SF Bhat, P Wonka - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Single image depth estimation is a foundational task in computer vision and generative
modeling. However prevailing depth estimation models grapple with accommodating the …

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

U Rajapaksha, F Sohel, H Laga, D Diepeveen… - ACM Computing …, 2024 - dl.acm.org
Estimating depth from single RGB images and videos is of widespread interest due to its
applications in many areas, including autonomous driving, 3D reconstruction, digital …

Voxelnextfusion: A simple, unified and effective voxel fusion framework for multi-modal 3d object detection

Z Song, G Zhang, J Xie, L Liu, C Jia, S Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
LiDAR-camera fusion can enhance the performance of 3D object detection by utilizing
complementary information between depth-aware LiDAR points and semantically rich …

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

F Zhang, S You, Y Li, Y Fu - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Monocular depth estimation has experienced significant progress on terrestrial images in
recent years thanks to deep learning advancements. But it remains inadequate for …

Joint depth prediction and semantic segmentation with multi-view sam

M Shvets, D Zhao, M Niethammer… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multi-task approaches to joint depth and segmentation prediction are well-studied for
monocular images. Yet, predictions from a single-view are inherently limited, while multiple …

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Y Ge, Y Tang, J Xu, C Gokmen, C Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
The systematic evaluation and understanding of computer vision models under varying
conditions require large amounts of data with comprehensive and customized labels which …