相关文章- 学术资源搜索

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Y Gao, Z Wang, WS Zheng, C Xie… - Proceedings of the …, 2024 - openaccess.thecvf.com

Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding ie aligning point cloud representation to image and text embedding space …

[PDF] arxiv.org

MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation

Y Gao, Z Wang, WS Zheng, C Xie, Y Zhou - arXiv preprint arXiv …, 2023 - arxiv.org

Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding, jointly with text, image, and point cloud. In this paper, we introduce …

被引用次数：1 相关文章所有 3 个版本

[PDF] thecvf.com

CLIP2: Contrastive language-image-point pretraining from real-world point cloud data

Y Zeng, C Jiang, J Mao, J Han, C Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled
text-image pairs, has demonstrated great performance in open-world vision understanding …

被引用次数：63 相关文章所有 5 个版本

[PDF] arxiv.org

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

H Li, Y Zhou, Y Zeng, H Xu, X Liang - arXiv preprint arXiv:2402.06198, 2024 - arxiv.org

3D Shape represented as point cloud has achieve advancements in multimodal pre-training
to align image and language descriptions, which is curial to object identification …

Context-aware alignment and mutual masking for 3d-language pre-training

Z Jin, M Hayat, Y Yang, Y Guo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract 3D visual language reasoning plays an important role in effective human-computer
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …

被引用次数：28 相关文章所有 3 个版本

[PDF] arxiv.org

Joint representation learning for text and 3d point cloud

R Huang, X Pan, H Zheng, H Jiang, Z Xie, C Wu… - Pattern Recognition, 2024 - Elsevier

Recent advancements in vision-language pre-training (eg, CLIP) have enabled 2D vision
models to benefit from language supervision. However, the joint representation learning of …

被引用次数：10 相关文章所有 4 个版本

[PDF] thecvf.com

Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding

J Yang, R Ding, W Deng, Z Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

We propose a lightweight and scalable Regional Point-Language Contrastive learning
framework namely RegionPLC for open-world 3D scene understanding aiming to identify …

被引用次数：31 相关文章所有 3 个版本

[PDF] thecvf.com

Towards large-scale 3d representation learning with multi-dataset point prompt training

X Wu, Z Tian, X Wen, B Peng, X Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

The rapid advancement of deep learning models is often attributed to their ability to leverage
massive training data. In contrast such privilege has not yet fully benefited 3D deep learning …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images

Y Mao, J Jing, K Mikolajczyk - arXiv preprint arXiv:2404.16538, 2024 - arxiv.org

Recent advances in Vision and Language Models (VLMs) have improved open-world 3D
representation, facilitating 3D zero-shot capability in unseen categories. Existing open-world …

3D Vision and Language Pretraining with Large-Scale Synthetic Data

D Yang, Z Xu, W Mo, Q Chen, S Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

3D Vision-Language Pre-training (3D-VLP) aims to provide a pre-train model which can
bridge 3D scenes with natural language, which is an important technique for embodied …

高级搜索

QQ 群

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation

CLIP2: Contrastive language-image-point pretraining from real-world point cloud data

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

Context-aware alignment and mutual masking for 3d-language pre-training

Joint representation learning for text and 3d point cloud

Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding

Towards large-scale 3d representation learning with multi-dataset point prompt training

OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images

3D Vision and Language Pretraining with Large-Scale Synthetic Data

相关搜索

引用