相关文章- 学术资源搜索

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

H Huang, Z Wang, R Huang, L Liu, X Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent research has evidenced the significant potentials of Large Language Models (LLMs)
in handling challenging tasks within 3D scenes. However, current models are constrained to …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Chat-3d: Data-efficiently tuning large language model for universal dialogue of 3d scenes

Z Wang, H Huang, Y Zhao, Z Zhang, Z Zhao - arXiv preprint arXiv …, 2023 - arxiv.org

3D scene understanding has gained significant attention due to its wide range of
applications. However, existing methods for 3D scene understanding are limited to specific …

被引用次数：18 相关文章所有 2 个版本

LGR-NET: Language Guided Reasoning Network for Referring Expression Comprehension

M Lu, R Li, F Feng, Z Ma, X Wang - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Referring Expression Comprehension (REC) is a fundamental task in the vision and
language domain, which aims to locate an image region according to a natural language …

被引用次数：2 相关文章

[PDF] thecvf.com

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Y Gao, Z Wang, WS Zheng, C Xie… - Proceedings of the …, 2024 - openaccess.thecvf.com

Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding ie aligning point cloud representation to image and text embedding space …

[PDF] thecvf.com

Multi3drefer: Grounding text description to multiple 3d objects

Y Zhang, ZM Gong, AX Chang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …

被引用次数：19 相关文章所有 7 个版本

[PDF] academia.edu

Referit3d: Neural listeners for fine-grained 3d object identification in real-world scenes

P Achlioptas, A Abdelreheem, F Xia… - Computer Vision–ECCV …, 2020 - Springer

In this work we study the problem of using referential language to identify common objects in
real-world 3D scenes. We focus on a challenging setup where the referred object belongs to …

被引用次数：193 相关文章所有 6 个版本

[PDF] thecvf.com

X-trans2cap: Cross-modal knowledge transfer using transformer for 3d dense captioning

Z Yuan, X Yan, Y Liao, Y Guo, G Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract 3D dense captioning aims to describe individual objects by natural language in 3D
scenes, where 3D scenes are usually represented as RGB-D scans or point clouds …

被引用次数：59 相关文章所有 7 个版本

Learning semantics on radar point-clouds

ST Isele, F Klein, M Brosowsky… - 2021 IEEE Intelligent …, 2021 - ieeexplore.ieee.org

Localization and perception research for Autonomous Driving is mainly focused on camera
and LiDAR data, rarely on radar data. We apply an automated labeling pipeline to …

被引用次数：5 相关文章所有 2 个版本

Question Generation for Uncertainty Elimination in Referring Expressions in 3D Environments

F Matsuzawa, Y Qiu, K Iwata… - … on Robotics and …, 2023 - ieeexplore.ieee.org

We introduce a new task of question generation to eliminate the uncertainty of referring
expressions in 3D indoor environments (3D-REQ). Referring to an object using natural …

被引用次数：1 相关文章

[PDF] arxiv.org

CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar

JH Cheng, SY Kuan, H Latapie, G Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Robust perception is a vital component for ensuring safe autonomous and assisted driving.
Automotive radar (77 to 81 GHz), which offers weather-resilient sensing, provides a …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

Chat-3d: Data-efficiently tuning large language model for universal dialogue of 3d scenes

LGR-NET: Language Guided Reasoning Network for Referring Expression Comprehension

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Multi3drefer: Grounding text description to multiple 3d objects

Referit3d: Neural listeners for fine-grained 3d object identification in real-world scenes

X-trans2cap: Cross-modal knowledge transfer using transformer for 3d dense captioning

Learning semantics on radar point-clouds

Question Generation for Uncertainty Elimination in Referring Expressions in 3D Environments

CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar

引用