查看文章

archive.org 中的 [PDF]

Advancing Video Question Answering with a Multi-modal and Multi-layer Question Enhancement Network

作者

Meng Liu, Fenglei Zhang, Xin Luo, Fan Liu, Yinwei Wei, Liqiang Nie

发表日期

2023/10/26

图书

Proceedings of the 31st ACM International Conference on Multimedia

页码范围

3985-3993

简介

Video question answering is an increasingly vital research field, spurred by the rapid proliferation of video content online and the urgent need for intelligent systems that can comprehend and interact with this content. Existing methodologies often lean towards video understanding and cross-modal information interaction modeling but tend to overlook the crucial aspect of comprehensive question understanding. To address this gap, we introduce the multi-modal and multi-layer question enhancement network, a groundbreaking framework emphasizing nuanced question understanding. Our approach begins by extracting object, appearance, and motion features from videos. Subsequently, we harness multi-layer outputs from a pre-trained language model, ensuring a thorough grasp of the question. Integrating object data into appearance is guided by global question and frame representation, facilitating the adaptive …

引用总数

被引用次数：2

20242

学术搜索中的文章

Advancing Video Question Answering with a Multi-modal and Multi-layer Question Enhancement Network

M Liu, F Zhang, X Luo, F Liu, Y Wei, L Nie - Proceedings of the 31st ACM International Conference …, 2023

被引用次数：2 相关文章所有 2 个版本