CityRefer: geography-aware 3D visual grounding dataset on city-scale point cloud data

T Miyanishi, F Kitamori, S Kurita, J Lee… - arXiv preprint arXiv …, 2023 - arxiv.org
City-scale 3D point cloud is a promising way to express detailed and complicated outdoor
structures. It encompasses both the appearance and geometry features of segmented city …

RPCS v2. 0: Object-detection-based recurrent point cloud selection method for 3D dense captioning

S Hayashi, Z Zhang, J Zhou - Neurocomputing, 2024 - Elsevier
Abstract 3D dense captioning is the process of generating natural language descriptions for
objects in a 3D scene, represented as RGB-D scans or point clouds. Three problems …

Dense captioning and multidimensional evaluations for indoor robotic scenes

H Wang, W Wang, W Li, H Liu - Frontiers in Neurorobotics, 2023 - frontiersin.org
The field of human-computer interaction is expanding, especially within the domain of
intelligent technologies. Scene understanding, which entails the generation of advanced …

Open-Ended 3D Point Cloud Instance Segmentation

PDA Nguyen, M Luu, A Tran, C Pham… - arXiv preprint arXiv …, 2024 - arxiv.org
Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated
their ability to generalize to unseen objects. However, these methods still depend on …

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

Y Luo, H Lin, X Zheng, Y Jiang, F Chao, J Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in
various 3D applications, which require both shared and complementary information in …

Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans

T Miyanishi, D Azuma, S Kurita… - … Conference on 3D …, 2024 - ieeexplore.ieee.org
We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG),
which overcomes limitations of existing 3D visual grounding models, specifically their …

Novel 3D Scene Understanding Applications From Recurrence in a Single Image

S Zhang, S Bharadwaj, K Kraiger, Y Asthana… - arXiv preprint arXiv …, 2022 - arxiv.org
We demonstrate the utility of recurring pattern discovery from a single image for spatial
understanding of a 3D scene in terms of (1) vanishing point detection,(2) hypothesizing 3D …

Quat-DGNet: Enhancing 3D Dense Captioning with Quaternion-Based Spatial Offsets and Dynamic Neighborhood Graphs

S Li, X Su, J Li, F Zhang - Chinese Conference on Pattern Recognition and …, 2024 - Springer
Abstract 3D dense captioning aims at generating more detailed and accurate descriptions
for objects in a 3D scene. Since the one-stage (detect-and-describe) model does not have a …

A Survey of Language-Grounded Multimodal 3d Scene Understanding

R Ren, X Zhao, W Xu, J Cao, X Xu, X Zhang - Available at SSRN 4992295 - papers.ssrn.com
As an emergent task bridging vision and language, Language-grounded Multimodal 3D
Scene Understanding (3D-LMSU) has attracted significant interest across various domains …

[PDF][PDF] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (Supplementary Materials)

H Wang, C Zhang, J Yu, W Cai - spacap3d.github.io
In this supplementary for SpaCap3D [Wang et al., 2022], we provide more details of the
learnable positional encoding in Section 1. We visualize the attention mechanism used in …