Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

L Xue, N Yu, S Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

Z Guo, R Zhang, X Zhu, Y Tang, X Ma, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …

Woodpecker: Hallucination correction for multimodal large language models

S Yin, C Fu, S Zhao, T Xu, H Wang, D Sui… - arXiv preprint arXiv …, 2023 - arxiv.org
Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large
Language Models (MLLMs), referring to the phenomenon that the generated text is …

Binding touch to everything: Learning unified multimodal tactile representations

F Yang, C Feng, Z Chen, H Park… - Proceedings of the …, 2024 - openaccess.thecvf.com
The ability to associate touch with other modalities has huge implications for humans and
computational systems. However multimodal learning with touch remains challenging due to …

Pointllm: Empowering large language models to understand point clouds

R Xu, X Wang, T Wang, Y Chen, J Pang… - arXiv preprint arXiv …, 2023 - arxiv.org
The unprecedented advancements in Large Language Models (LLMs) have created a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …

Gpt4point: A unified framework for point-language understanding and generation

Z Qi, Y Fang, Z Sun, X Wu, T Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multimodal Large Language Models (MLLMs) have excelled in 2D image-text
comprehension and image generation but their understanding of the 3D world is notably …

Language embedded 3d gaussians for open-vocabulary scene understanding

JC Shi, M Wang, HB Duan… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Open-vocabulary querying in 3D space is challenging but essential for scene understanding
tasks such as object localization and segmentation. Language-embedded scene …

Visual programming for zero-shot open-vocabulary 3d visual grounding

Z Yuan, J Ren, CM Feng, H Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract 3D Visual Grounding (3DVG) aims at localizing 3D object based on textual
descriptions. Conventional supervised methods for 3DVG often necessitate extensive …

Open3dsg: Open-vocabulary 3d scene graphs from point clouds with queryable objects and open-set relationships

S Koch, N Vaskevicius, M Colosi… - Proceedings of the …, 2024 - openaccess.thecvf.com
Current approaches for 3D scene graph prediction rely on labeled datasets to train models
for a fixed set of known object classes and relationship categories. We present Open3DSG …