Spatiality-guided transformer for 3d dense captioning on point clouds

T Miyanishi, F Kitamori, S Kurita, J Lee… - arXiv preprint arXiv …, 2023 - arxiv.org

City-scale 3D point cloud is a promising way to express detailed and complicated outdoor
structures. It encompasses both the appearance and geometry features of segmented city …

被引用次数：3 相关文章所有 5 个版本

[PDF] ssrn.com

RPCS v2. 0: Object-detection-based recurrent point cloud selection method for 3D dense captioning

S Hayashi, Z Zhang, J Zhou - Neurocomputing, 2024 - Elsevier

Abstract 3D dense captioning is the process of generating natural language descriptions for
objects in a 3D scene, represented as RGB-D scans or point clouds. Three problems …

相关文章所有 2 个版本

[PDF] frontiersin.org

Dense captioning and multidimensional evaluations for indoor robotic scenes

H Wang, W Wang, W Li, H Liu - Frontiers in Neurorobotics, 2023 - frontiersin.org

The field of human-computer interaction is expanding, especially within the domain of
intelligent technologies. Scene understanding, which entails the generation of advanced …

Open-Ended 3D Point Cloud Instance Segmentation

PDA Nguyen, M Luu, A Tran, C Pham… - arXiv preprint arXiv …, 2024 - arxiv.org

Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated
their ability to generalize to unseen objects. However, these methods still depend on …

相关文章所有 2 个版本

[PDF] arxiv.org

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

Y Luo, H Lin, X Zheng, Y Jiang, F Chao, J Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in
various 3D applications, which require both shared and complementary information in …

相关文章所有 2 个版本

[PDF] arxiv.org

Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans

T Miyanishi, D Azuma, S Kurita… - … Conference on 3D …, 2024 - ieeexplore.ieee.org

We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG),
which overcomes limitations of existing 3D visual grounding models, specifically their …

Novel 3D Scene Understanding Applications From Recurrence in a Single Image

S Zhang, S Bharadwaj, K Kraiger, Y Asthana… - arXiv preprint arXiv …, 2022 - arxiv.org

We demonstrate the utility of recurring pattern discovery from a single image for spatial
understanding of a 3D scene in terms of (1) vanishing point detection,(2) hypothesizing 3D …

相关文章所有 2 个版本

Quat-DGNet: Enhancing 3D Dense Captioning with Quaternion-Based Spatial Offsets and Dynamic Neighborhood Graphs

S Li, X Su, J Li, F Zhang - Chinese Conference on Pattern Recognition and …, 2024 - Springer

Abstract 3D dense captioning aims at generating more detailed and accurate descriptions
for objects in a 3D scene. Since the one-stage (detect-and-describe) model does not have a …

[PDF] ssrn.com

A Survey of Language-Grounded Multimodal 3d Scene Understanding

R Ren, X Zhao, W Xu, J Cao, X Xu, X Zhang - Available at SSRN 4992295 - papers.ssrn.com

As an emergent task bridging vision and language, Language-grounded Multimodal 3D
Scene Understanding (3D-LMSU) has attracted significant interest across various domains …

[PDF] github.io

[PDF][PDF] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (Supplementary Materials)

H Wang, C Zhang, J Yu, W Cai - spacap3d.github.io

In this supplementary for SpaCap3D [Wang et al., 2022], we provide more details of the
learnable positional encoding in Section 1. We visualize the attention mechanism used in …

高级搜索

QQ 群