相关文章- 学术资源搜索

[PDF][PDF] “三维视觉—语言” 推理技术的前沿研究与最新趋势

雷印杰，徐凯，郭裕兰，杨鑫，武玉伟，胡玮，杨佳琪… - 中国图象图形学报, 2024 - cjig.cn

三维视觉推理的核心思想是对点云场景中的视觉主体间的关系进行理解. 非专业用户难以向
计算机传达自己的意图, 从而限制了该技术的普及与推广. 为此, 研究人员以自然语言作为语义 …

Clip-guided vision-language pre-training for question answering in 3d scenes

M Parelli, A Delitzas, N Hars… - Proceedings of the …, 2023 - openaccess.thecvf.com

Training models to apply linguistic knowledge and visual concepts from 2D images to 3D
world understanding is a promising direction that researchers have only recently started to …

被引用次数：21 相关文章所有 7 个版本

[PDF] thecvf.com

Situational Awareness Matters in 3D Vision Language Reasoning

Y Man, LY Gui, YX Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Being able to carry out complicated vision language reasoning tasks in 3D space represents
a significant milestone in developing household robots and human-centered embodied AI …

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

Q He, K Lin, S Chen, A Hu, Q Jin - arXiv preprint arXiv:2404.14705, 2024 - arxiv.org

This work addresses the 3D situated reasoning task which aims to answer questions given
egocentric observations in a 3D environment. The task remains challenging as it requires …

3d spatial multimodal knowledge accumulation for scene graph prediction in point cloud

M Feng, H Hou, L Zhang, Z Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

In-depth understanding of a 3D scene not only involves locating/recognizing individual
objects, but also requires to infer the relationships and interactions among them. However …

被引用次数：4 相关文章所有 4 个版本

[PDF] thecvf.com

3d concept learning and reasoning from multi-view images

Y Hong, C Lin, Y Du, Z Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Humans are able to accurately reason in 3D by gathering multi-view observations of the
surrounding world. Inspired by this insight, we introduce a new large-scale benchmark for …

被引用次数：39 相关文章所有 6 个版本

[PDF] arxiv.org

DOMINO: A Dual-System for Multi-step Visual Language Reasoning

P Wang, O Golovneva, A Aghajanyan, X Ren… - arXiv preprint arXiv …, 2023 - arxiv.org

Visual language reasoning requires a system to extract text or numbers from information-
dense images like charts or plots and perform logical or arithmetic reasoning to arrive at an …

被引用次数：2 相关文章所有 3 个版本

[PDF] aaai.org

Interpretable3d: An ad-hoc interpretable classifier for 3d point clouds

T Feng, R Quan, X Wang, W Wang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

3D decision-critical tasks urgently require research on explanations to ensure system
reliability and transparency. Extensive explanatory research has been conducted on 2D …

被引用次数：6 相关文章

Exploring hierarchical spatial layout cues for 3d point cloud based scene graph prediction

M Feng, H Hou, L Zhang, Y Guo, H Yu… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

3D scene graph prediction is important for intelligent agents to gather information and
perceive semantics of their environments. However, constructing an effective graph is …

被引用次数：36 相关文章所有 2 个版本

[PDF] thecvf.com

Vl-sat: Visual-linguistic semantics assisted training for 3d semantic scene graph prediction in point cloud

Z Wang, B Cheng, L Zhao, D Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

The task of 3D semantic scene graph (3DSSG) prediction in the point cloud is challenging
since (1) the 3D point cloud only captures geometric structures with limited semantics …

被引用次数：17 相关文章所有 5 个版本

高级搜索

QQ 群

[PDF][PDF] “三维视觉—语言” 推理技术的前沿研究与最新趋势

Clip-guided vision-language pre-training for question answering in 3d scenes

Situational Awareness Matters in 3D Vision Language Reasoning

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

3d spatial multimodal knowledge accumulation for scene graph prediction in point cloud

3d concept learning and reasoning from multi-view images

DOMINO: A Dual-System for Multi-step Visual Language Reasoning

Interpretable3d: An ad-hoc interpretable classifier for 3d point clouds

Exploring hierarchical spatial layout cues for 3d point cloud based scene graph prediction

Vl-sat: Visual-linguistic semantics assisted training for 3d semantic scene graph prediction in point cloud

引用