[PDF][PDF] “三维视觉—语言” 推理技术的前沿研究与最新趋势

雷印杰, 徐凯, 郭裕兰, 杨鑫, 武玉伟, 胡玮, 杨佳琪… - 中国图象图形学报, 2024 - cjig.cn
三维视觉推理的核心思想是对点云场景中的视觉主体间的关系进行理解. 非专业用户难以向
计算机传达自己的意图, 从而限制了该技术的普及与推广. 为此, 研究人员以自然语言作为语义 …

Clip-guided vision-language pre-training for question answering in 3d scenes

M Parelli, A Delitzas, N Hars… - Proceedings of the …, 2023 - openaccess.thecvf.com
Training models to apply linguistic knowledge and visual concepts from 2D images to 3D
world understanding is a promising direction that researchers have only recently started to …

Situational Awareness Matters in 3D Vision Language Reasoning

Y Man, LY Gui, YX Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Being able to carry out complicated vision language reasoning tasks in 3D space represents
a significant milestone in developing household robots and human-centered embodied AI …

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

Q He, K Lin, S Chen, A Hu, Q Jin - arXiv preprint arXiv:2404.14705, 2024 - arxiv.org
This work addresses the 3D situated reasoning task which aims to answer questions given
egocentric observations in a 3D environment. The task remains challenging as it requires …

3d spatial multimodal knowledge accumulation for scene graph prediction in point cloud

M Feng, H Hou, L Zhang, Z Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
In-depth understanding of a 3D scene not only involves locating/recognizing individual
objects, but also requires to infer the relationships and interactions among them. However …

3d concept learning and reasoning from multi-view images

Y Hong, C Lin, Y Du, Z Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Humans are able to accurately reason in 3D by gathering multi-view observations of the
surrounding world. Inspired by this insight, we introduce a new large-scale benchmark for …

DOMINO: A Dual-System for Multi-step Visual Language Reasoning

P Wang, O Golovneva, A Aghajanyan, X Ren… - arXiv preprint arXiv …, 2023 - arxiv.org
Visual language reasoning requires a system to extract text or numbers from information-
dense images like charts or plots and perform logical or arithmetic reasoning to arrive at an …

Interpretable3d: An ad-hoc interpretable classifier for 3d point clouds

T Feng, R Quan, X Wang, W Wang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
3D decision-critical tasks urgently require research on explanations to ensure system
reliability and transparency. Extensive explanatory research has been conducted on 2D …

Exploring hierarchical spatial layout cues for 3d point cloud based scene graph prediction

M Feng, H Hou, L Zhang, Y Guo, H Yu… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
3D scene graph prediction is important for intelligent agents to gather information and
perceive semantics of their environments. However, constructing an effective graph is …

Vl-sat: Visual-linguistic semantics assisted training for 3d semantic scene graph prediction in point cloud

Z Wang, B Cheng, L Zhao, D Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
The task of 3D semantic scene graph (3DSSG) prediction in the point cloud is challenging
since (1) the 3D point cloud only captures geometric structures with limited semantics …