TY Wu, SY Huang, YCF Wang - arXiv preprint arXiv:2403.16539, 2024 - arxiv.org
3D visual grounding aims to identify the target object within a 3D point cloud scene referred
to by a natural language description. While previous works attempt to exploit the verbo …