2dpass: 2d priors assisted semantic segmentation on lidar point clouds

X Yan, J Gao, C Zheng, C Zheng, R Zhang… - … on Computer Vision, 2022 - Springer
As camera and LiDAR sensors capture complementary information in autonomous driving,
great efforts have been made to conduct semantic segmentation through multi-modality data …

Pointclip: Point cloud understanding by clip

R Zhang, Z Guo, W Zhang, K Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training
(CLIP) have shown inspirational performance on 2D visual recognition, which learns to …

Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding

M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

Onellm: One framework to align all modalities with language

J Han, K Gong, Y Zhang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have gained significant attention due to their
strong multimodal understanding capability. However existing works rely heavily on modality …

Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving

A Ando, S Gidaris, A Bursuc, G Puy… - Proceedings of the …, 2023 - openaccess.thecvf.com
Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, eg, via
range projection, is an effective and popular approach. These projection-based methods …

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

Z Guo, R Zhang, X Zhu, Y Tang, X Ma, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …

Clip-fo3d: Learning free open-world 3d scene representations from 2d dense clip

J Zhang, R Dong, K Ma - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Training a 3D scene understanding model requires complicated human annotations, which
are laborious to collect and result in a model only encoding close-set object semantics. In …

Clip goes 3d: Leveraging prompt tuning for language grounded 3d recognition

D Hegde, JMJ Valanarasu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Vision-Language models like CLIP have been widely adopted for various tasks due to their
impressive zero-shot capabilities. However, CLIP is not suitable for extracting 3D geometric …

Growsp: Unsupervised semantic segmentation of 3d point clouds

Z Zhang, B Yang, B Wang, B Li - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We study the problem of 3D semantic segmentation from raw point clouds. Unlike existing
methods which primarily rely on a large amount of human annotations for training neural …