From representation to reasoning: Towards both evidence and commonsense reasoning for video question-answering

J Li, L Niu, L Zhang - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Video understanding has achieved great success in representation learning, such as video
caption, video object grounding, and video descriptive question-answer. However, current …

Next-qa: Next phase of question-answering to explaining temporal actions

J Xiao, X Shang, A Yao… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We introduce NExT-QA, a rigorously designed video question answering (VideoQA)
benchmark to advance video understanding from describing to explaining the temporal …

Discovering the real association: Multimodal causal reasoning in video question answering

C Zang, H Wang, M Pei… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Video Question Answering (VideoQA) is challenging as it requires capturing
accurate correlations between modalities from redundant information. Recent methods focus …

Video graph transformer for video question answering

J Xiao, P Zhou, TS Chua, S Yan - European Conference on Computer …, 2022 - Springer
This paper proposes a Video Graph Transformer (VGT) model for Video Question Answering
(VideoQA). VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer …

Intentqa: Context-aware video intent reasoning

J Li, P Wei, W Han, L Fan - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we propose a novel task IntentQA, a special VideoQA task focusing on video
intent reasoning, which has become increasingly important for AI with its advantages in …

Invariant grounding for video question answering

Y Li, X Wang, J Xiao, W Ji… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Abstract Video Question Answering (VideoQA) is the task of answering questions about a
video. At its core is understanding the alignments between visual scenes in video and …

Tvqa+: Spatio-temporal grounding for video question answering

J Lei, L Yu, TL Berg, M Bansal - arXiv preprint arXiv:1904.11574, 2019 - arxiv.org
We present the task of Spatio-Temporal Video Question Answering, which requires
intelligent systems to simultaneously retrieve relevant moments and detect referenced visual …

Discovering spatio-temporal rationales for video question answering

Y Li, J Xiao, C Feng, X Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper strives to solve complex video question answering (VideoQA) which features
long videos containing multiple objects and events at different time. To tackle the challenge …

Dualvgr: A dual-visual graph reasoning unit for video question answering

J Wang, BK Bao, C Xu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org
Video question answering is a challenging task, which requires agents to be able to
understand rich video contents and perform spatial-temporal reasoning. However, existing …

Visual causal scene refinement for video question answering

Y Wei, Y Liu, H Yan, G Li, L Lin - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Existing methods for video question answering (VideoQA) often suffer from spurious
correlations between different modalities, leading to a failure in identifying the dominant …