Vote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning

S Chen, H Zhu, M Li, X Chen, P Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
3D dense captioning requires a model to translate its understanding of an input 3D scene
into several captions associated with different object regions. Existing methods adopt a …

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

S Chen, H Zhu, M Li, X Chen, P Guo, Y Lei… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Abstract 3D dense captioning requires a model to translate its understanding of an input 3D
scene into several captions associated with different object regions. Existing methods adopt …