相关文章- 学术资源搜索

Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

R Guan, R Zhang, N Ouyang, J Liu, KL Man… - arXiv preprint arXiv …, 2024 - arxiv.org

Embodied perception is essential for intelligent vehicles and robots, enabling more natural
interaction and task execution. However, these advancements currently embrace vision …

Transcrib3D: 3D Referring Expression Resolution through Large Language Models

J Fang, X Tan, S Lin, I Vasiljevic, V Guizilini… - arXiv preprint arXiv …, 2024 - arxiv.org

If robots are to work effectively alongside people, they must be able to interpret natural
language references to objects in their 3D environment. Understanding 3D referring …

Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning

J Fang, X Tan, S Lin, H Mei, M Walter - 2nd Workshop on Language …, 2023 - openreview.net

If robots are to work effectively alongside people, they must be able to interpret natural
language references to objects in their 3D environment. Understanding 3D referring …

被引用次数：1 相关文章所有 2 个版本

[PDF] aaai.org

3d-stmn: Dependency-driven superpoint-text matching network for end-to-end 3d referring expression segmentation

C Wu, Y Ma, Q Chen, H Wang, G Luo, J Ji… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-
stage paradigm, extracting segmentation proposals and then matching them with referring …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Spatiality-guided transformer for 3d dense captioning on point clouds

H Wang, C Zhang, J Yu, W Cai - arXiv preprint arXiv:2204.10688, 2022 - arxiv.org

Dense captioning in 3D point clouds is an emerging vision-and-language task involving
object-level 3D scene understanding. Apart from coarse semantic class prediction and …

被引用次数：23 相关文章所有 5 个版本

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

KC Huang, X Li, L Qi, S Yan, MH Yang - arXiv preprint arXiv:2405.17427, 2024 - arxiv.org

Recent advancements in multimodal large language models (LLMs) have shown their
potential in various domains, especially concept reasoning. Despite these developments …

Language-guided 3d object detection in point cloud for autonomous driving

W Cheng, J Yin, W Li, R Yang, J Shen - arXiv preprint arXiv:2305.15765, 2023 - arxiv.org

This paper addresses the problem of 3D referring expression comprehension (REC) in
autonomous driving scenario, which aims to ground a natural language to the targeted …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Y Tang, X Han, X Li, Q Yu, Y Hao, L Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging
Large Language Models (LLMs) with images using a simple projector. Inspired by their …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Vote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning

S Chen, H Zhu, M Li, X Chen, P Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

3D dense captioning requires a model to translate its understanding of an input 3D scene
into several captions associated with different object regions. Existing methods adopt a …

被引用次数：2 相关文章所有 2 个版本

[PDF] aclanthology.org

ARKitSceneRefer: Text-based Localization of Small Objects in Diverse Real-World 3D Indoor Scenes

S Kato, S Kurita, C Chu… - Findings of the Association …, 2023 - aclanthology.org

Abstract 3D referring expression comprehension is a task to ground text representations
onto objects in 3D scenes. It is a crucial task for indoor household robots or augmented …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群

Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

Transcrib3D: 3D Referring Expression Resolution through Large Language Models

Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning

3d-stmn: Dependency-driven superpoint-text matching network for end-to-end 3d referring expression segmentation

Spatiality-guided transformer for 3d dense captioning on point clouds

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

Language-guided 3d object detection in point cloud for autonomous driving

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Vote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning

ARKitSceneRefer: Text-based Localization of Small Objects in Diverse Real-World 3D Indoor Scenes

引用