作者
Meng Liu, Fenglei Zhang, Xin Luo, Fan Liu, Yinwei Wei, Liqiang Nie
发表日期
2023/10/26
图书
Proceedings of the 31st ACM International Conference on Multimedia
页码范围
3985-3993
简介
Video question answering is an increasingly vital research field, spurred by the rapid proliferation of video content online and the urgent need for intelligent systems that can comprehend and interact with this content. Existing methodologies often lean towards video understanding and cross-modal information interaction modeling but tend to overlook the crucial aspect of comprehensive question understanding. To address this gap, we introduce the multi-modal and multi-layer question enhancement network, a groundbreaking framework emphasizing nuanced question understanding. Our approach begins by extracting object, appearance, and motion features from videos. Subsequently, we harness multi-layer outputs from a pre-trained language model, ensuring a thorough grasp of the question. Integrating object data into appearance is guided by global question and frame representation, facilitating the adaptive …
引用总数
学术搜索中的文章
M Liu, F Zhang, X Luo, F Liu, Y Wei, L Nie - Proceedings of the 31st ACM International Conference …, 2023