S Chen, H Zhu, X Chen, Y Lei… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract 3D dense captioning aims to generate multiple captions localized with their associated object regions. Existing methods follow a sophisticated" detect-then-describe" …
Abstract 3D dense captioning aims to describe individual objects by natural language in 3D scenes, where 3D scenes are usually represented as RGB-D scans or point clouds …
A Mao, Z Yang, W Chen, R Yi… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
3D dense captioning aims to semantically describe each object detected in a 3D scene, which plays a significant role in 3D scene understanding. Previous works lack a complete …
Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and …
S Chen, H Zhu, M Li, X Chen, P Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a …
Y Luo, H Lin, X Zheng, Y Jiang, F Chao, J Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in …
Abstract 3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart. However, it is also more challenging …
In this supplementary material, we provide results on the ReferIt3D dataset in Sec. 1. To showcase the effectiveness of our speaker-listener architecture, we provide additional …
There has been a lot of previous work in the area of cross-modal scene understanding, especially for the tasks of image captioning [32] and the related task of dense captioning …