object-level 3D scene understanding. Apart from coarse semantic class prediction and
bounding box regression as in traditional 3D object detection, 3D dense captioning aims at
producing a further and finer instance-level label of natural language description on visual
appearance and spatial relations for each scene object of interest. To detect and describe
objects in a scene, following the spirit of neural machine translation, we propose a …