Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their …
C Zhu, T Wang, W Zhang, K Chen, X Liu - arXiv preprint arXiv:2407.01525, 2024 - arxiv.org
Although great progress has been made in 3D visual grounding, current models still rely on explicit textual descriptions for grounding and lack the ability to reason human intentions …
B Jin, Y Zheng, P Li, W Li, Y Zheng, S Hu, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
3D dense captioning stands as a cornerstone in achieving a comprehensive understanding of 3D scenes through natural language. It has recently witnessed remarkable achievements …
A Xiao, X Zhang, L Shao, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
In the past decade, deep neural networks have achieved significant progress in point cloud learning. However, collecting large-scale precisely-annotated point clouds is extremely …
Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language …
Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D …
The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its …
T Luo, J Johnson, H Lee - arXiv preprint arXiv:2404.07984, 2024 - arxiv.org
Scalable annotation approaches are crucial for constructing extensive 3D-text datasets, facilitating a broader range of applications. However, existing methods sometimes lead to …
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and …