Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Y Gao, Z Wang, WS Zheng, C Xie… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding ie aligning point cloud representation to image and text embedding space …

MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation

Y Gao, Z Wang, WS Zheng, C Xie, Y Zhou - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding, jointly with text, image, and point cloud. In this paper, we introduce …

CLIP2: Contrastive language-image-point pretraining from real-world point cloud data

Y Zeng, C Jiang, J Mao, J Han, C Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled
text-image pairs, has demonstrated great performance in open-world vision understanding …

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

H Li, Y Zhou, Y Zeng, H Xu, X Liang - arXiv preprint arXiv:2402.06198, 2024 - arxiv.org
3D Shape represented as point cloud has achieve advancements in multimodal pre-training
to align image and language descriptions, which is curial to object identification …

Context-aware alignment and mutual masking for 3d-language pre-training

Z Jin, M Hayat, Y Yang, Y Guo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract 3D visual language reasoning plays an important role in effective human-computer
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …

Joint representation learning for text and 3d point cloud

R Huang, X Pan, H Zheng, H Jiang, Z Xie, C Wu… - Pattern Recognition, 2024 - Elsevier
Recent advancements in vision-language pre-training (eg, CLIP) have enabled 2D vision
models to benefit from language supervision. However, the joint representation learning of …

Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding

J Yang, R Ding, W Deng, Z Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We propose a lightweight and scalable Regional Point-Language Contrastive learning
framework namely RegionPLC for open-world 3D scene understanding aiming to identify …

Towards large-scale 3d representation learning with multi-dataset point prompt training

X Wu, Z Tian, X Wen, B Peng, X Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
The rapid advancement of deep learning models is often attributed to their ability to leverage
massive training data. In contrast such privilege has not yet fully benefited 3D deep learning …

OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images

Y Mao, J Jing, K Mikolajczyk - arXiv preprint arXiv:2404.16538, 2024 - arxiv.org
Recent advances in Vision and Language Models (VLMs) have improved open-world 3D
representation, facilitating 3D zero-shot capability in unseen categories. Existing open-world …

3D Vision and Language Pretraining with Large-Scale Synthetic Data

D Yang, Z Xu, W Mo, Q Chen, S Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
3D Vision-Language Pre-training (3D-VLP) aims to provide a pre-train model which can
bridge 3D scenes with natural language, which is an important technique for embodied …