作者
Dejing Xu, Zhou Zhao, Jun Xiao, Fei Wu, Hanwang Zhang, Xiangnan He, Yueting Zhuang
发表日期
2017/10/23
研讨会论文
Proceedings of the 25th ACM international conference on Multimedia
页码范围
1645-1653
出版商
ACM
简介
Recently image question answering (ImageQA) has gained lots of attention in the research community. However, as its natural extension, video question answering (VideoQA) is less explored. Although both tasks look similar, VideoQA is more challenging mainly because of the complexity and diversity of videos. As such, simply extending the ImageQA methods to videos is insufficient and suboptimal. Particularly, working with the video needs to model its inherent temporal structure and analyze the diverse information it contains. In this paper, we consider exploiting the appearance and motion information resided in the video with a novel attention mechanism. More specifically, we propose an end-to-end model which gradually refines its attention over the appearance and motion features of the video using the question as guidance. The question is processed word by word until the model generates the final …
引用总数
20182019202020212022202320241029265875172113
学术搜索中的文章
D Xu, Z Zhao, J Xiao, F Wu, H Zhang, X He, Y Zhuang - Proceedings of the 25th ACM international conference …, 2017