Language-Guided Visual Aggregation Network for Video Question Answering

X Liang, D Wang, Q Wang, B Wan, L An… - Proceedings of the 31st …, 2023 - dl.acm.org
Video Question Answering (VideoQA) aims to comprehend intricate relationships, actions,
and events within video content, as well as the inherent links between objects and scenes …

Relation-aware hierarchical attention framework for video question answering

F Li, T Bai, C Cao, Z Liu, C Yan, B Wu - Proceedings of the 2021 …, 2021 - dl.acm.org
Video Question Answering (VideoQA) is a challenging video understanding task since it
requires a deep understanding of both question and video. Previous studies mainly focus on …

Generalized pyramid co-attention with learnable aggregation net for video question answering

L Gao, T Chen, X Li, P Zeng, L Zhao, YF Li - Pattern Recognition, 2021 - Elsevier
Video based visual question answering (V-VQA) remains challenging at the intersection of
vision and language. In this paper, we propose a novel architecture, namely Generalized …

Lightweight recurrent cross-modal encoder for video question answering

SA Immanuel, C Jeong - Knowledge-Based Systems, 2023 - Elsevier
A video question answering task essentially boils down to how to fuse the information
between text and video effectively to predict an answer. Most works employ a transformer …

Dualvgr: A dual-visual graph reasoning unit for video question answering

J Wang, BK Bao, C Xu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org
Video question answering is a challenging task, which requires agents to be able to
understand rich video contents and perform spatial-temporal reasoning. However, existing …

Heterogeneous memory enhanced multimodal attention model for video question answering

C Fan, X Zhang, S Zhang, W Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com
In this paper, we propose a novel end-to-end trainable Video Question Answering
(VideoQA) framework with three major components: 1) a new heterogeneous memory which …

Action-centric relation transformer network for video question answering

J Zhang, J Shao, R Cao, L Gao, X Xu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Video question answering (VideoQA) has emerged as a popular research topic in recent
years. Enormous efforts have been devoted to developing more effective fusion strategies …

Progressive graph attention network for video question answering

L Peng, S Yang, Y Bin, G Wang - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Video question answering~(Video-QA) is a task of answering a natural language question
related to the content of a video. Existing methods generally explore the single interactions …

Language-aware Visual Semantic Distillation for Video Question Answering

B Zou, C Yang, Y Qiao, C Quan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Significant advancements in video question answering (VideoQA) have been made thanks
to thriving large image-language pretraining frameworks. Although these image-language …

Cross-attentional spatio-temporal semantic graph networks for video question answering

Y Liu, X Zhang, F Huang, B Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Due to the rich spatio-temporal visual content and complex multimodal relations, Video
Question Answering (VideoQA) has become a challenging task and attracted increasing …