3d-aware visual question answering about parts, poses and occlusions

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

3d-aware visual question answering about parts, poses and occlusions

在引用文章中搜索

[PDF] arxiv.org

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

W Ma, G Zeng, G Zhang, Q Liu, L Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

A vision model with general-purpose object-level 3D understanding should be capable of
inferring both 2D (eg, class name and bounding box) and 3D information (eg, 3D location …

[PDF] arxiv.org

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

X Wang, W Ma, A Wang, S Chen, A Kortylewski… - arXiv preprint arXiv …, 2024 - arxiv.org

For vision-language models (VLMs), understanding the dynamic properties of objects and
their interactions within 3D scenes from video is crucial for effective reasoning. In this work …

LCV2: A Universal Pretraining-Free Framework for Grounded Visual Question Answering

Y Chen, L Su, L Chen, Z Lin - Electronics, 2024 - mdpi.com

Grounded Visual Question Answering systems place heavy reliance on substantial
computational power and data resources in pretraining. In response to this challenge, this …

高级搜索

QQ 群

3d-aware visual question answering about parts, poses and occlusions

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

LCV2: A Universal Pretraining-Free Framework for Grounded Visual Question Answering

引用