相关文章- 学术资源搜索

From representation to reasoning: Towards both evidence and commonsense reasoning for video question-answering

J Li, L Niu, L Zhang - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com

Video understanding has achieved great success in representation learning, such as video
caption, video object grounding, and video descriptive question-answer. However, current …

被引用次数：45 相关文章所有 5 个版本

[PDF] thecvf.com

Next-qa: Next phase of question-answering to explaining temporal actions

J Xiao, X Shang, A Yao… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

We introduce NExT-QA, a rigorously designed video question answering (VideoQA)
benchmark to advance video understanding from describing to explaining the temporal …

被引用次数：220 相关文章所有 6 个版本

[PDF] thecvf.com

Discovering the real association: Multimodal causal reasoning in video question answering

C Zang, H Wang, M Pei… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Video Question Answering (VideoQA) is challenging as it requires capturing
accurate correlations between modalities from redundant information. Recent methods focus …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Video graph transformer for video question answering

J Xiao, P Zhou, TS Chua, S Yan - European Conference on Computer …, 2022 - Springer

This paper proposes a Video Graph Transformer (VGT) model for Video Question Answering
(VideoQA). VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer …

被引用次数：70 相关文章所有 6 个版本

[PDF] thecvf.com

Intentqa: Context-aware video intent reasoning

J Li, P Wei, W Han, L Fan - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

In this paper, we propose a novel task IntentQA, a special VideoQA task focusing on video
intent reasoning, which has become increasingly important for AI with its advantages in …

被引用次数：13 相关文章所有 5 个版本

[PDF] thecvf.com

Invariant grounding for video question answering

Y Li, X Wang, J Xiao, W Ji… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Abstract Video Question Answering (VideoQA) is the task of answering questions about a
video. At its core is understanding the alignments between visual scenes in video and …

被引用次数：100 相关文章所有 5 个版本

[PDF] arxiv.org

Tvqa+: Spatio-temporal grounding for video question answering

J Lei, L Yu, TL Berg, M Bansal - arXiv preprint arXiv:1904.11574, 2019 - arxiv.org

We present the task of Spatio-Temporal Video Question Answering, which requires
intelligent systems to simultaneously retrieve relevant moments and detect referenced visual …

被引用次数：230 相关文章所有 4 个版本

[PDF] thecvf.com

Discovering spatio-temporal rationales for video question answering

Y Li, J Xiao, C Feng, X Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper strives to solve complex video question answering (VideoQA) which features
long videos containing multiple objects and events at different time. To tackle the challenge …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Dualvgr: A dual-visual graph reasoning unit for video question answering

J Wang, BK Bao, C Xu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org

Video question answering is a challenging task, which requires agents to be able to
understand rich video contents and perform spatial-temporal reasoning. However, existing …

被引用次数：73 相关文章所有 4 个版本

[PDF] arxiv.org

Visual causal scene refinement for video question answering

Y Wei, Y Liu, H Yan, G Li, L Lin - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Existing methods for video question answering (VideoQA) often suffer from spurious
correlations between different modalities, leading to a failure in identifying the dominant …

被引用次数：12 相关文章所有 3 个版本

高级搜索

QQ 群

From representation to reasoning: Towards both evidence and commonsense reasoning for video question-answering

Next-qa: Next phase of question-answering to explaining temporal actions

Discovering the real association: Multimodal causal reasoning in video question answering

Video graph transformer for video question answering

Intentqa: Context-aware video intent reasoning

Invariant grounding for video question answering

Tvqa+: Spatio-temporal grounding for video question answering

Discovering spatio-temporal rationales for video question answering

Dualvgr: A dual-visual graph reasoning unit for video question answering

Visual causal scene refinement for video question answering

引用