查看文章

arxiv.org 中的 [PDF]

Learning to Agree on Vision Attention for Visual Commonsense Reasoning

作者

Zhenyang Li, Yangyang Guo, Kejie Wang, Fan Liu, Liqiang Nie, Mohan Kankanhalli

发表日期

2023/2/4

期刊

IEEE Transactions on Multimedia

简介

Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning. A VCR model generally aims at answering a textual question regarding an image, followed by the rationale prediction for the preceding answering process. Though these two processes are sequential and intertwined, existing methods always consider them as two independent matching-based instances. They, therefore, ignore the pivotal relationship between the two processes, leading to sub-optimal model performance. This paper presents a novel visual attention alignment method to efficaciously handle these two processes in a unified framework. To achieve this, we first design a re-attention module for aggregating the vision attention map produced in each process. Thereafter, the resultant two sets of attention maps are carefully aligned to guide the two processes to make decisions …

引用总数

被引用次数：7

202320242 5

学术搜索中的文章

Learning to agree on vision attention for visual commonsense reasoning

Z Li, Y Guo, K Wang, F Liu, L Nie, M Kankanhalli - IEEE Transactions on Multimedia, 2023

被引用次数：7 相关文章所有 3 个版本