Interpretable visual reasoning via probabilistic formulation under natural supervision

M Ding, Z Chen, T Du, P Luo… - Advances In Neural …, 2021 - proceedings.neurips.cc

In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable
Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects …

被引用次数：79 相关文章所有 8 个版本

Explicit cross-modal representation learning for visual commonsense reasoning

X Zhang, F Zhang, C Xu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org

Given a question about an image, Visual Commonsense Reasoning (VCR) needs to provide
not only a correct answer, but also a rationale to justify the answer. VCR is a challenging …

被引用次数：29 相关文章所有 2 个版本

[PDF] aaai.org

Class-incremental instance segmentation via multi-teacher networks

Y Gu, C Deng, K Wei - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org

Although deep neural networks have achieved amazing results on instance segmentation,
they are still ill-equipped when they are required to learn new tasks incrementally …

被引用次数：31 相关文章所有 5 个版本

[PDF] arxiv.org

Semantic-aware modular capsule routing for visual question answering

Y Han, J Yin, J Wu, Y Wei, L Nie - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org

Visual Question Answering (VQA) is fundamentally compositional in nature, and many
questions are simply answered by decomposing them into modular sub-problems. The …

被引用次数：15 相关文章所有 7 个版本

[PDF] arxiv.org

General greedy de-bias learning

X Han, S Wang, C Su, Q Huang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Neural networks often make predictions relying on the spurious correlations from the
datasets rather than the intrinsic properties of the task of interest, facing with sharp …

被引用次数：11 相关文章所有 7 个版本

MSAM: Deep Semantic Interaction Network for Visual Question Answering

F Wang, B Wang, F Xu, J Li, P Liu - International Conference on …, 2023 - Springer

Abstract In Visual Question Answering (VQA) task, extracting semantic information from
multimodalities and effectively utilizing this information for interaction is crucial. Existing VQA …

[HTML][HTML] 以图像视频为中心的跨媒体分析与推理

黄庆明，王树徽，许倩倩，李亮，蒋树强 - 智能系统学报, 2021 - html.rhhz.net

如何跨越从跨媒体数据到跨媒体知识所面临的“异构鸿沟” 和“语义鸿沟”, 对体量巨大的跨媒体
数据进行有效管理与利用, 是发展新一代人工智能亟待突破的瓶颈问题. 针对以图像视频为代表 …

高级搜索

QQ 群