Spatiality-guided transformer for 3d dense captioning on point clouds

H Wang, C Zhang, J Yu, W Cai - arXiv preprint arXiv:2204.10688, 2022 - arxiv.org
Dense captioning in 3D point clouds is an emerging vision-and-language task involving
object-level 3D scene understanding. Apart from coarse semantic class prediction and
bounding box regression as in traditional 3D object detection, 3D dense captioning aims at
producing a further and finer instance-level label of natural language description on visual
appearance and spatial relations for each scene object of interest. To detect and describe
objects in a scene, following the spirit of neural machine translation, we propose a …

[PDF][PDF] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (Supplementary Materials)

H Wang, C Zhang, J Yu, W Cai - spacap3d.github.io
In this supplementary for SpaCap3D [Wang et al., 2022], we provide more details of the
learnable positional encoding in Section 1. We visualize the attention mechanism used in
our SpaCap3D framework in Section 2 and provide more qualitative results of our method in
Section 3.
以上显示的是最相近的搜索结果。 查看全部搜索结果