Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

H Huang, Z Wang, R Huang, L Liu, X Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent research has evidenced the significant potentials of Large Language Models (LLMs)
in handling challenging tasks within 3D scenes. However, current models are constrained to …

Chat-3d: Data-efficiently tuning large language model for universal dialogue of 3d scenes

Z Wang, H Huang, Y Zhao, Z Zhang, Z Zhao - arXiv preprint arXiv …, 2023 - arxiv.org
3D scene understanding has gained significant attention due to its wide range of
applications. However, existing methods for 3D scene understanding are limited to specific …

LGR-NET: Language Guided Reasoning Network for Referring Expression Comprehension

M Lu, R Li, F Feng, Z Ma, X Wang - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Referring Expression Comprehension (REC) is a fundamental task in the vision and
language domain, which aims to locate an image region according to a natural language …

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Y Gao, Z Wang, WS Zheng, C Xie… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding ie aligning point cloud representation to image and text embedding space …

Multi3drefer: Grounding text description to multiple 3d objects

Y Zhang, ZM Gong, AX Chang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …

Referit3d: Neural listeners for fine-grained 3d object identification in real-world scenes

P Achlioptas, A Abdelreheem, F Xia… - Computer Vision–ECCV …, 2020 - Springer
In this work we study the problem of using referential language to identify common objects in
real-world 3D scenes. We focus on a challenging setup where the referred object belongs to …

X-trans2cap: Cross-modal knowledge transfer using transformer for 3d dense captioning

Z Yuan, X Yan, Y Liao, Y Guo, G Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract 3D dense captioning aims to describe individual objects by natural language in 3D
scenes, where 3D scenes are usually represented as RGB-D scans or point clouds …

Learning semantics on radar point-clouds

ST Isele, F Klein, M Brosowsky… - 2021 IEEE Intelligent …, 2021 - ieeexplore.ieee.org
Localization and perception research for Autonomous Driving is mainly focused on camera
and LiDAR data, rarely on radar data. We apply an automated labeling pipeline to …

Question Generation for Uncertainty Elimination in Referring Expressions in 3D Environments

F Matsuzawa, Y Qiu, K Iwata… - … on Robotics and …, 2023 - ieeexplore.ieee.org
We introduce a new task of question generation to eliminate the uncertainty of referring
expressions in 3D indoor environments (3D-REQ). Referring to an object using natural …

CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar

JH Cheng, SY Kuan, H Latapie, G Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Robust perception is a vital component for ensuring safe autonomous and assisted driving.
Automotive radar (77 to 81 GHz), which offers weather-resilient sensing, provides a …