相关文章- 学术资源搜索

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

Y Luo, H Lin, X Zheng, Y Jiang, F Chao, J Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in
various 3D applications, which require both shared and complementary information in …

Toward explainable and fine-grained 3d grounding through referring textual phrases

Z Yuan, X Yan, Z Li, X Li, Y Guo, S Cui, Z Li - arXiv preprint arXiv …, 2022 - arxiv.org

Recent progress in 3D scene understanding has explored visual grounding (3DVG) to
localize a target object through a language description. However, existing methods only …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Grounded 3D-LLM with Referent Tokens

Y Chen, S Yang, H Huang, T Wang, R Lyu, R Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

Prior studies on 3D scene understanding have primarily developed specialized models for
specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D …

Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving

Y Wang, JH Cheng, JT Huang, SY Kuan, Q Fu… - arXiv preprint arXiv …, 2023 - arxiv.org

Sensor fusion is crucial for an accurate and robust perception system on autonomous
vehicles. Most existing datasets and perception solutions focus on fusing cameras and …

Scanents3d: Exploiting phrase-to-3d-object correspondences for improved visio-linguistic models in 3d scenes

A Abdelreheem, K Olszewski, HY Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com

The two popular datasets ScanRefer [20] and ReferIt3D [5] connect natural language to real-
world 3D scenes. In this paper, we curate a complementary dataset extending both the …

被引用次数：14 相关文章所有 5 个版本

[PDF] thecvf.com

Lerf: Language embedded radiance fields

J Kerr, CM Kim, K Goldberg… - Proceedings of the …, 2023 - openaccess.thecvf.com

Humans describe the physical world using natural language to refer to specific 3D locations
based on a vast range of properties: visual appearance, semantics, abstract associations, or …

被引用次数：176 相关文章所有 6 个版本

[PDF] arxiv.org

Watervg: Waterway visual grounding based on text-guided vision and mmwave radar

R Guan, L Jia, F Yang, S Yao, E Purwanto… - arXiv preprint arXiv …, 2024 - arxiv.org

The perception of waterways based on human intent holds significant importance for
autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water …

被引用次数：2 相关文章所有 4 个版本

[PDF] aaai.org

X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks

Z Qian, Y Ma, J Ji, X Sun - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Referring 3D instance segmentation is a challenging task aimed at accurately segmenting a
target instance within a 3D scene based on a given referring expression. However, previous …

[PDF] thecvf.com

Context-aware alignment and mutual masking for 3d-language pre-training

Z Jin, M Hayat, Y Yang, Y Guo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract 3D visual language reasoning plays an important role in effective human-computer
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

Toward explainable and fine-grained 3d grounding through referring textual phrases

Grounded 3D-LLM with Referent Tokens

Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving

Scanents3d: Exploiting phrase-to-3d-object correspondences for improved visio-linguistic models in 3d scenes

Lerf: Language embedded radiance fields

Watervg: Waterway visual grounding based on text-guided vision and mmwave radar

X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks

Context-aware alignment and mutual masking for 3d-language pre-training

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

引用