查看文章

ustc.edu.cn 中的 [PDF]

Video question answering via gradually refined attention over appearance and motion

作者

Dejing Xu, Zhou Zhao, Jun Xiao, Fei Wu, Hanwang Zhang, Xiangnan He, Yueting Zhuang

发表日期

2017/10/23

研讨会论文

Proceedings of the 25th ACM international conference on Multimedia

页码范围

1645-1653

出版商

ACM

简介

Recently image question answering (ImageQA) has gained lots of attention in the research community. However, as its natural extension, video question answering (VideoQA) is less explored. Although both tasks look similar, VideoQA is more challenging mainly because of the complexity and diversity of videos. As such, simply extending the ImageQA methods to videos is insufficient and suboptimal. Particularly, working with the video needs to model its inherent temporal structure and analyze the diverse information it contains. In this paper, we consider exploiting the appearance and motion information resided in the video with a novel attention mechanism. More specifically, we propose an end-to-end model which gradually refines its attention over the appearance and motion features of the video using the question as guidance. The question is processed word by word until the model generates the final …

引用总数

被引用次数：485

201820192020202120222023202410 29 26 58 75 172 113

学术搜索中的文章

Video question answering via gradually refined attention over appearance and motion

D Xu, Z Zhao, J Xiao, F Wu, H Zhang, X He, Y Zhuang - Proceedings of the 25th ACM international conference …, 2017

被引用次数：485 相关文章所有 3 个版本