相关文章- 学术资源搜索

Language-Guided Visual Aggregation Network for Video Question Answering

X Liang, D Wang, Q Wang, B Wan, L An… - Proceedings of the 31st …, 2023 - dl.acm.org

Video Question Answering (VideoQA) aims to comprehend intricate relationships, actions,
and events within video content, as well as the inherent links between objects and scenes …

被引用次数：1 相关文章

[PDF] arxiv.org

Relation-aware hierarchical attention framework for video question answering

F Li, T Bai, C Cao, Z Liu, C Yan, B Wu - Proceedings of the 2021 …, 2021 - dl.acm.org

Video Question Answering (VideoQA) is a challenging video understanding task since it
requires a deep understanding of both question and video. Previous studies mainly focus on …

被引用次数：12 相关文章所有 6 个版本

Generalized pyramid co-attention with learnable aggregation net for video question answering

L Gao, T Chen, X Li, P Zeng, L Zhao, YF Li - Pattern Recognition, 2021 - Elsevier

Video based visual question answering (V-VQA) remains challenging at the intersection of
vision and language. In this paper, we propose a novel architecture, namely Generalized …

被引用次数：8 相关文章所有 4 个版本

Lightweight recurrent cross-modal encoder for video question answering

SA Immanuel, C Jeong - Knowledge-Based Systems, 2023 - Elsevier

A video question answering task essentially boils down to how to fuse the information
between text and video effectively to predict an answer. Most works employ a transformer …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Dualvgr: A dual-visual graph reasoning unit for video question answering

J Wang, BK Bao, C Xu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org

Video question answering is a challenging task, which requires agents to be able to
understand rich video contents and perform spatial-temporal reasoning. However, existing …

被引用次数：74 相关文章所有 4 个版本

[PDF] thecvf.com

Heterogeneous memory enhanced multimodal attention model for video question answering

C Fan, X Zhang, S Zhang, W Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com

In this paper, we propose a novel end-to-end trainable Video Question Answering
(VideoQA) framework with three major components: 1) a new heterogeneous memory which …

被引用次数：305 相关文章所有 8 个版本

[PDF] smu.edu.sg

Action-centric relation transformer network for video question answering

J Zhang, J Shao, R Cao, L Gao, X Xu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Video question answering (VideoQA) has emerged as a popular research topic in recent
years. Enormous efforts have been devoted to developing more effective fusion strategies …

被引用次数：35 相关文章所有 4 个版本

Progressive graph attention network for video question answering

L Peng, S Yang, Y Bin, G Wang - Proceedings of the 29th ACM …, 2021 - dl.acm.org

Video question answering~(Video-QA) is a task of answering a natural language question
related to the content of a video. Existing methods generally explore the single interactions …

被引用次数：37 相关文章

[PDF] thecvf.com

Language-aware Visual Semantic Distillation for Video Question Answering

B Zou, C Yang, Y Qiao, C Quan… - Proceedings of the …, 2024 - openaccess.thecvf.com

Significant advancements in video question answering (VideoQA) have been made thanks
to thriving large image-language pretraining frameworks. Although these image-language …

Cross-attentional spatio-temporal semantic graph networks for video question answering

Y Liu, X Zhang, F Huang, B Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Due to the rich spatio-temporal visual content and complex multimodal relations, Video
Question Answering (VideoQA) has become a challenging task and attracted increasing …

被引用次数：29 相关文章所有 4 个版本

高级搜索

QQ 群