C Fan, X Zhang, S Zhang, W Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com
In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which …
B Zou, C Yang, Y Qiao, C Quan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Significant advancements in video question answering (VideoQA) have been made thanks to thriving large image-language pretraining frameworks. Although these image-language …
H Wang, D Guo, XS Hua, M Wang - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Video Question Answering (VideoQA) is a challenging problem, as it requires a joint understanding of video and natural language question. Existing methods perform correlation …
B Zou, C Yang, Y Qiao, C Quan, Y Zhao - arXiv preprint arXiv:2404.00973, 2024 - arxiv.org
Significant advancements in video question answering (VideoQA) have been made thanks to thriving large image-language pretraining frameworks. Although these image-language …
X Liang, D Wang, Q Wang, B Wan, L An… - Proceedings of the 31st …, 2023 - dl.acm.org
Video Question Answering (VideoQA) aims to comprehend intricate relationships, actions, and events within video content, as well as the inherent links between objects and scenes …
J Park, J Lee, K Sohn - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
This paper presents a novel method, termed Bridge to Answer, to infer correct answers for questions about a given video by leveraging adequate graph interactions of heterogeneous …
This paper identifies two kinds of redundancy in the current VideoQA paradigm. Specifically, the current video encoders tend to holistically embed all video clues at different granularities …
M Peng, C Wang, Y Gao, Y Shi, XD Zhou - arXiv preprint arXiv:2109.04735, 2021 - arxiv.org
Video question answering (VideoQA) is challenging given its multimodal combination of visual understanding and natural language understanding. While existing approaches …
J Liang, X Meng, Y Wang, C Liu, Q Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Video Question Answering (VideoQA) has emerged as a challenging frontier in the field of multimedia processing, requiring intricate interactions between visual and textual modalities …