Contrastive learning has emerged as a promising paradigm for 3D open-world understanding, jointly with text, image, and point cloud. In this paper, we introduce …
Y Zeng, C Jiang, J Mao, J Han, C Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding …
H Li, Y Zhou, Y Zeng, H Xu, X Liang - arXiv preprint arXiv:2402.06198, 2024 - arxiv.org
3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions, which is curial to object identification …
Z Jin, M Hayat, Y Yang, Y Guo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract 3D visual language reasoning plays an important role in effective human-computer interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …
R Huang, X Pan, H Zheng, H Jiang, Z Xie, C Wu… - Pattern Recognition, 2024 - Elsevier
Recent advancements in vision-language pre-training (eg, CLIP) have enabled 2D vision models to benefit from language supervision. However, the joint representation learning of …
We propose a lightweight and scalable Regional Point-Language Contrastive learning framework namely RegionPLC for open-world 3D scene understanding aiming to identify …
The rapid advancement of deep learning models is often attributed to their ability to leverage massive training data. In contrast such privilege has not yet fully benefited 3D deep learning …
Recent advances in Vision and Language Models (VLMs) have improved open-world 3D representation, facilitating 3D zero-shot capability in unseen categories. Existing open-world …
D Yang, Z Xu, W Mo, Q Chen, S Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
3D Vision-Language Pre-training (3D-VLP) aims to provide a pre-train model which can bridge 3D scenes with natural language, which is an important technique for embodied …