作者
Zhenyang Li, Yangyang Guo, Kejie Wang, Fan Liu, Liqiang Nie, Mohan Kankanhalli
发表日期
2023/2/4
期刊
IEEE Transactions on Multimedia
简介
Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning. A VCR model generally aims at answering a textual question regarding an image, followed by the rationale prediction for the preceding answering process. Though these two processes are sequential and intertwined, existing methods always consider them as two independent matching-based instances. They, therefore, ignore the pivotal relationship between the two processes, leading to sub-optimal model performance. This paper presents a novel visual attention alignment method to efficaciously handle these two processes in a unified framework. To achieve this, we first design a re-attention module for aggregating the vision attention map produced in each process. Thereafter, the resultant two sets of attention maps are carefully aligned to guide the two processes to make decisions …
引用总数
学术搜索中的文章
Z Li, Y Guo, K Wang, F Liu, L Nie, M Kankanhalli - IEEE Transactions on Multimedia, 2023