Dual hierarchical temporal convolutional network with QA-aware dynamic normalization for...

L Li, T Jin, W Lin, H Jiang, W Pan… - … on Circuits and …, 2023 - ieeexplore.ieee.org

Recent methods for video question answering (VideoQA), aiming to generate answers
based on given questions and video content, have made significant progress in cross-modal …

被引用次数：11 相关文章

A universal quaternion hypergraph network for multimodal video question answering

Z Guo, J Zhao, L Jiao, X Liu, F Liu - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Fusion and interaction of multimodal features are essential for video question answering.
Structural information composed of the relationships between different objects in videos is …

被引用次数：27 相关文章

[PDF] arxiv.org

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

J Chung, Y Yu - arXiv preprint arXiv:2311.01233, 2023 - arxiv.org

Large language models such as GPT-3 have demonstrated an impressive capability to
adapt to new tasks without requiring task-specific training data. This capability has been …

被引用次数：3 相关文章所有 4 个版本

[PDF] acm.org

Hierarchical Synergy-Enhanced Multimodal Relational Network for Video Question Answering

M Peng, X Shao, Y Shi, X Zhou - ACM Transactions on Multimedia …, 2023 - dl.acm.org

Video question answering (VideoQA) is challenging as it requires reasoning about natural
language and multimodal interactive relations. Most existing methods apply attention …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Temporal pyramid transformer with multimodal interaction for video question answering

M Peng, C Wang, Y Gao, Y Shi, XD Zhou - arXiv preprint arXiv:2109.04735, 2021 - arxiv.org

Video question answering (VideoQA) is challenging given its multimodal combination of
visual understanding and natural language understanding. While existing approaches …

被引用次数：5 相关文章所有 2 个版本

Multi-Semantic Alignment Co-Reasoning Network for Video Question Answering

M Peng, L Liu, Z Li, Y Shi, X Zhou - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Video question answering challenges models on understanding textual questions with
varying complexity and searching for clues from visual content with different hierarchical …

被引用次数：1 相关文章

[PDF] arxiv.org

Triple Attention Network architecture for MovieQA

A Shah, TH Lin, S Wu - arXiv preprint arXiv:2111.09531, 2021 - arxiv.org

Movie question answering, or MovieQA is a multimedia related task wherein one is provided
with a video, the subtitle information, a question and candidate answers for it. The task is to …

Time-Evolving Conditional Character-centric Graphs for Movie Understanding

LH Dang, TM Le, V Le, TM Phuong, T Tran - NeurIPS 2022 Temporal … - openreview.net

Temporal graph structure learning for long-term human-centric video understanding is
promising but remains challenging due to the scarcity of dense graph annotations for long …

高级搜索

QQ 群