MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation

Y Gao, Z Wang, WS Zheng, C Xie, Y Zhou - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding, jointly with text, image, and point cloud. In this paper, we introduce …

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Y Gao, Z Wang, WS Zheng, C Xie… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding ie aligning point cloud representation to image and text embedding space …

Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

Z Qi, R Dong, G Fan, Z Ge, X Zhang… - … on Machine Learning, 2023 - proceedings.mlr.press
Mainstream 3D representation learning approaches are built upon contrastive or generative
modeling pretext tasks, where great improvements in performance on various downstream …

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

Y Tang, J Liu, D Wang, Z Wang, S Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large foundation models have recently emerged as a prominent focus of interest, attaining
superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts …

Masked scene contrast: A scalable framework for unsupervised 3d representation learning

X Wu, X Wen, X Liu, H Zhao - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
As a pioneering work, PointContrast conducts unsupervised 3D representation learning via
leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on …

Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering

B Fei, Y Li, W Yang, L Ma, Y He - arXiv preprint arXiv:2404.13619, 2024 - arxiv.org
State-of-the-art 3D models, which excel in recognition tasks, typically depend on large-scale
datasets and well-defined category sets. Recent advances in multi-modal pre-training have …

Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning?

R Dong, Z Qi, L Zhang, J Zhang, J Sun, Z Ge… - arXiv preprint arXiv …, 2022 - arxiv.org
The success of deep learning heavily relies on large-scale data with comprehensive labels,
which is more expensive and time-consuming to fetch in 3D compared to 2D images or …

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

H Li, Y Zhou, Y Zeng, H Xu, X Liang - arXiv preprint arXiv:2402.06198, 2024 - arxiv.org
3D Shape represented as point cloud has achieve advancements in multimodal pre-training
to align image and language descriptions, which is curial to object identification …

JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues

J Ji, H Wang, C Wu, Y Ma, X Sun, R Ji - arXiv preprint arXiv:2310.09503, 2023 - arxiv.org
The rising importance of 3D representation learning, pivotal in computer vision, autonomous
driving, and robotics, is evident. However, a prevailing trend, which straightforwardly …

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …