Open3dis: Open-vocabulary 3d instance segmentation with 2d mask guidance

P Nguyen, TD Ngo, E Kalogerakis… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce Open3DIS a novel solution designed to tackle the problem of Open-
Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments …

Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding

J Yang, R Ding, W Deng, Z Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We propose a lightweight and scalable Regional Point-Language Contrastive learning
framework namely RegionPLC for open-world 3D scene understanding aiming to identify …

Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

Q Gu, A Kuwajerwala, S Morin… - arXiv preprint arXiv …, 2023 - arxiv.org
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Grounded 3D-LLM with Referent Tokens

Y Chen, S Yang, H Huang, T Wang, R Lyu, R Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Prior studies on 3D scene understanding have primarily developed specialized models for
specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D …

Can 3D Vision-Language Models Truly Understand Natural Language?

W Deng, R Ding, J Yang, J Liu, Y Li, X Qi… - arXiv preprint arXiv …, 2024 - arxiv.org
Rapid advancements in 3D vision-language (3D-VL) tasks have opened up new avenues
for human interaction with embodied agents or robots using natural language. Despite this …

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

X Zhu, H Zhou, P Xing, L Zhao, H Xu, J Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we investigate the use of diffusion models which are pre-trained on large-scale
image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel …

Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning

K Liu, YJ Liu, K Tang, M Liu, B Chen - arXiv preprint arXiv:2312.00663, 2023 - arxiv.org
Deep neural network models have achieved remarkable progress in 3D scene
understanding while trained in the closed-set setting and with full labels. However, the major …

Segment Any 3D Object with Language

S Lee, Y Zhao, GH Lee - arXiv preprint arXiv:2404.02157, 2024 - arxiv.org
In this paper, we investigate Open-Vocabulary 3D Instance Segmentation (OV-3DIS) with
free-form language instructions. Earlier works that rely on only annotated base categories for …