Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning

Z Han, C Gao, J Liu, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

被引用次数：47 相关文章所有 2 个版本

[PDF] thecvf.com

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

L Xue, N Yu, S Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …

被引用次数：61 相关文章所有 3 个版本

[PDF] arxiv.org

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

Z Guo, R Zhang, X Zhu, Y Tang, X Ma, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …

被引用次数：56 相关文章所有 3 个版本

[PDF] arxiv.org

Woodpecker: Hallucination correction for multimodal large language models

S Yin, C Fu, S Zhao, T Xu, H Wang, D Sui… - arXiv preprint arXiv …, 2023 - arxiv.org

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large
Language Models (MLLMs), referring to the phenomenon that the generated text is …

被引用次数：71 相关文章所有 2 个版本

[PDF] thecvf.com

Binding touch to everything: Learning unified multimodal tactile representations

F Yang, C Feng, Z Chen, H Park… - Proceedings of the …, 2024 - openaccess.thecvf.com

The ability to associate touch with other modalities has huge implications for humans and
computational systems. However multimodal learning with touch remains challenging due to …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Pointllm: Empowering large language models to understand point clouds

R Xu, X Wang, T Wang, Y Chen, J Pang… - arXiv preprint arXiv …, 2023 - arxiv.org

The unprecedented advancements in Large Language Models (LLMs) have created a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …

被引用次数：56 相关文章所有 3 个版本

[PDF] thecvf.com

Gpt4point: A unified framework for point-language understanding and generation

Z Qi, Y Fang, Z Sun, X Wu, T Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multimodal Large Language Models (MLLMs) have excelled in 2D image-text
comprehension and image generation but their understanding of the 3D world is notably …

被引用次数：11 相关文章所有 3 个版本

[PDF] thecvf.com

Language embedded 3d gaussians for open-vocabulary scene understanding

JC Shi, M Wang, HB Duan… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Open-vocabulary querying in 3D space is challenging but essential for scene understanding
tasks such as object localization and segmentation. Language-embedded scene …

被引用次数：16 相关文章所有 3 个版本

[PDF] thecvf.com

Visual programming for zero-shot open-vocabulary 3d visual grounding

Z Yuan, J Ren, CM Feng, H Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract 3D Visual Grounding (3DVG) aims at localizing 3D object based on textual
descriptions. Conventional supervised methods for 3DVG often necessitate extensive …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Open3dsg: Open-vocabulary 3d scene graphs from point clouds with queryable objects and open-set relationships

S Koch, N Vaskevicius, M Colosi… - Proceedings of the …, 2024 - openaccess.thecvf.com

Current approaches for 3D scene graph prediction rely on labeled datasets to train models
for a fixed set of known object classes and relationship categories. We present Open3DSG …

被引用次数：6 相关文章所有 3 个版本

高级搜索

QQ 群