Dynamic visual reasoning by learning differentiable physics models from video and language

M Ding, Z Chen, T Du, P Luo… - Advances In Neural …, 2021 - proceedings.neurips.cc
In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable
Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects …

Explicit cross-modal representation learning for visual commonsense reasoning

X Zhang, F Zhang, C Xu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org
Given a question about an image, Visual Commonsense Reasoning (VCR) needs to provide
not only a correct answer, but also a rationale to justify the answer. VCR is a challenging …

Class-incremental instance segmentation via multi-teacher networks

Y Gu, C Deng, K Wei - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
Although deep neural networks have achieved amazing results on instance segmentation,
they are still ill-equipped when they are required to learn new tasks incrementally …

Semantic-aware modular capsule routing for visual question answering

Y Han, J Yin, J Wu, Y Wei, L Nie - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
Visual Question Answering (VQA) is fundamentally compositional in nature, and many
questions are simply answered by decomposing them into modular sub-problems. The …

General greedy de-bias learning

X Han, S Wang, C Su, Q Huang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Neural networks often make predictions relying on the spurious correlations from the
datasets rather than the intrinsic properties of the task of interest, facing with sharp …

MSAM: Deep Semantic Interaction Network for Visual Question Answering

F Wang, B Wang, F Xu, J Li, P Liu - International Conference on …, 2023 - Springer
Abstract In Visual Question Answering (VQA) task, extracting semantic information from
multimodalities and effectively utilizing this information for interaction is crucial. Existing VQA …

[HTML][HTML] 以图像视频为中心的跨媒体分析与推理

黄庆明, 王树徽, 许倩倩, 李亮, 蒋树强 - 智能系统学报, 2021 - html.rhhz.net
如何跨越从跨媒体数据到跨媒体知识所面临的“异构鸿沟” 和“语义鸿沟”, 对体量巨大的跨媒体
数据进行有效管理与利用, 是发展新一代人工智能亟待突破的瓶颈问题. 针对以图像视频为代表 …