Toward explainable and fine-grained 3d grounding through referring textual phrases

Z Yuan, X Yan, Z Li, X Li, Y Guo, S Cui, Z Li - arXiv preprint arXiv …, 2022 - arxiv.org
Recent progress in 3D scene understanding has explored visual grounding (3DVG) to
localize a target object through a language description. However, existing methods only …

Multi3drefer: Grounding text description to multiple 3d objects

Y Zhang, ZM Gong, AX Chang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …

Viewrefer: Grasp the multi-view knowledge for 3d visual grounding with gpt and prototype guidance

Z Guo, Y Tang, R Zhang, D Wang, Z Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Understanding 3D scenes from multi-view inputs has been proven to alleviate the view
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …

Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment

X Xu, Y Yuan, Q Zhang, W Wu, Z Jie, L Ma… - arXiv preprint arXiv …, 2023 - arxiv.org
Learning to ground natural language queries to target objects or regions in 3D point clouds
is quite essential for 3D scene understanding. Nevertheless, existing 3D visual grounding …

Exploiting contextual objects and relations for 3d visual grounding

L Yang, Z Zhang, Z Qi, Y Xu, W Liu… - Advances in …, 2024 - proceedings.neurips.cc
Abstract 3D visual grounding, the task of identifying visual objects in 3D scenes based on
natural language inputs, plays a critical role in enabling machines to understand and …

DOrA: 3D Visual Grounding with Order-Aware Referring

TY Wu, SY Huang, YCF Wang - arXiv preprint arXiv:2403.16539, 2024 - arxiv.org
3D visual grounding aims to identify the target object within a 3D point cloud scene referred
to by a natural language description. While previous works attempt to exploit the verbo …

Distilling coarse-to-fine semantic matching knowledge for weakly supervised 3d visual grounding

Z Wang, H Huang, Y Zhao, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract 3D visual grounding involves finding a target object in a 3D scene that corresponds
to a given sentence query. Although many approaches have been proposed and achieved …

Dense object grounding in 3d scenes

W Huang, D Liu, W Hu - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
Localizing objects in 3D scenes according to the semantics of a given natural language is a
fundamental yet important task in the field of multimedia understanding, which benefits …

Viewrefer: Grasp the multi-view knowledge for 3d visual grounding

Z Guo, Y Tang, R Zhang, D Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding 3D scenes from multi-view inputs has been proven to alleviate the view
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …

Eda: Explicit text-decoupling and dense alignment for 3d visual grounding

Y Wu, X Cheng, R Zhang, Z Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …