Resimad: Zero-shot 3d domain transfer for autonomous driving with source reconstruction and target simulation

B Zhang, X Cai, J Yuan, D Yang, J Guo, R Xia… - arXiv preprint arXiv …, 2023 - arxiv.org
Domain shifts such as sensor type changes and geographical situation variations are
prevalent in Autonomous Driving (AD), which poses a challenge since AD model relying on …

CLIP2: Contrastive language-image-point pretraining from real-world point cloud data

Y Zeng, C Jiang, J Mao, J Han, C Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled
text-image pairs, has demonstrated great performance in open-world vision understanding …

Unsupervised 3d perception with 2d vision-language distillation for autonomous driving

M Najibi, J Ji, Y Zhou, CR Qi, X Yan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Closed-set 3D perception models trained on only a pre-defined set of object categories can
be inadequate for safety critical applications such as autonomous driving where new object …

Clip-fo3d: Learning free open-world 3d scene representations from 2d dense clip

J Zhang, R Dong, K Ma - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Training a 3D scene understanding model requires complicated human annotations, which
are laborious to collect and result in a model only encoding close-set object semantics. In …

Clip2scene: Towards label-efficient 3d scene understanding by clip

R Chen, Y Liu, L Kong, X Zhu, Y Ma… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) achieves promising results in 2D
zero-shot and few-shot learning. Despite the impressive performance in 2D, applying CLIP …

Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning

X Zhu, R Zhang, B He, Z Guo, Z Zeng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale pre-trained models have shown promising open-world performance for both
vision and language tasks. However, their transferred capacity on 3D point clouds is still …

Pointclip: Point cloud understanding by clip

R Zhang, Z Guo, W Zhang, K Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training
(CLIP) have shown inspirational performance on 2D visual recognition, which learns to …

Pla: Language-driven open-vocabulary 3d scene understanding

R Ding, J Yang, C Xue, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Open-vocabulary scene understanding aims to localize and recognize unseen categories
beyond the annotated label space. The recent breakthrough of 2D open-vocabulary …

MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D Vehicle Reconstruction in Autonomous Driving

Y Liu, K Zhu, G Wu, Y Ren, B Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Reconstructing 3D vehicles from noisy and sparse partial point clouds is of great
significance to autonomous driving. Most existing 3D reconstruction methods cannot be …

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

L Xue, N Yu, S Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …