Image2point: 3d point-cloud understanding with 2d image pretrained models

X Yan, J Gao, C Zheng, C Zheng, R Zhang… - … on Computer Vision, 2022 - Springer

As camera and LiDAR sensors capture complementary information in autonomous driving,
great efforts have been made to conduct semantic segmentation through multi-modality data …

被引用次数：178 相关文章所有 7 个版本

[PDF] thecvf.com

Pointclip: Point cloud understanding by clip

R Zhang, Z Guo, W Zhang, K Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training
(CLIP) have shown inspirational performance on 2D visual recognition, which learns to …

被引用次数：312 相关文章所有 5 个版本

[PDF] thecvf.com

Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding

M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com

Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …

被引用次数：217 相关文章所有 7 个版本

[PDF] thecvf.com

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

被引用次数：88 相关文章所有 5 个版本

[PDF] thecvf.com

Onellm: One framework to align all modalities with language

J Han, K Gong, Y Zhang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) have gained significant attention due to their
strong multimodal understanding capability. However existing works rely heavily on modality …

被引用次数：17 相关文章所有 3 个版本

[PDF] thecvf.com

Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving

A Ando, S Gidaris, A Bursuc, G Puy… - Proceedings of the …, 2023 - openaccess.thecvf.com

Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, eg, via
range projection, is an effective and popular approach. These projection-based methods …

被引用次数：39 相关文章所有 11 个版本

[PDF] arxiv.org

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

Z Guo, R Zhang, X Zhu, Y Tang, X Ma, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …

被引用次数：52 相关文章所有 3 个版本

[PDF] thecvf.com

Clip-fo3d: Learning free open-world 3d scene representations from 2d dense clip

J Zhang, R Dong, K Ma - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Training a 3D scene understanding model requires complicated human annotations, which
are laborious to collect and result in a model only encoding close-set object semantics. In …

被引用次数：37 相关文章所有 6 个版本

[PDF] thecvf.com

Clip goes 3d: Leveraging prompt tuning for language grounded 3d recognition

D Hegde, JMJ Valanarasu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Vision-Language models like CLIP have been widely adopted for various tasks due to their
impressive zero-shot capabilities. However, CLIP is not suitable for extracting 3D geometric …

被引用次数：35 相关文章所有 5 个版本

[PDF] thecvf.com

Growsp: Unsupervised semantic segmentation of 3d point clouds

Z Zhang, B Yang, B Wang, B Li - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We study the problem of 3D semantic segmentation from raw point clouds. Unlike existing
methods which primarily rely on a large amount of human annotations for training neural …

被引用次数：26 相关文章所有 9 个版本

高级搜索

QQ 群