Spatio-temporal relational reasoning for video question answering

TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …

被引用次数：319 相关文章所有 11 个版本

[PDF] arxiv.org

Dualvgr: A dual-visual graph reasoning unit for video question answering

J Wang, BK Bao, C Xu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org

Video question answering is a challenging task, which requires agents to be able to
understand rich video contents and perform spatial-temporal reasoning. However, existing …

被引用次数：92 相关文章所有 4 个版本

[PDF] thecvf.com

Hair: Hierarchical visual-semantic relational reasoning for video question answering

F Liu, J Liu, W Wang, H Lu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Relational reasoning is at the heart of video question answering. However, existing
approaches suffer from several common limitations:(1) they only focus on either object-level …

被引用次数：61 相关文章所有 3 个版本

[PDF] arxiv.org

Hierarchical conditional relation networks for multimodal video question answering

TM Le, V Le, S Venkatesh, T Tran - International Journal of Computer …, 2021 - Springer

Abstract Video Question Answering (Video QA) challenges modelers in multiple fronts.
Modeling video necessitates building not only spatio-temporal models for the dynamic visual …

被引用次数：25 相关文章所有 10 个版本

3-D Relation Network for visual relation recognition in videos

Q Cao, H Huang, X Shang, B Wang, TS Chua - Neurocomputing, 2021 - Elsevier

Video visual relation recognition aims at mining the dynamic relation instances between
objects in the form of< subject, predicate, object>, such as “person1-towards-person2” and …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Object-centric representation learning for video question answering

LH Dang, TM Le, V Le, T Tran - 2021 International Joint …, 2021 - ieeexplore.ieee.org

Video question answering (Video QA) presents a powerful testbed for human-like intelligent
behaviors. The task demands new capabilities to integrate video processing, language …

被引用次数：8 相关文章所有 6 个版本

TLNet: Temporal Span Localization Network With Collaborative Graph Reasoning for Video Question Answering

L Liang, G Sun, T Li, S Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Video question answering (VideoQA) has witnessed remarkable progress in the past few
years, but there are still challenges in precisely locating question-related segments and …

[PDF] arxiv.org

Video question answering on screencast tutorials

W Zhao, S Kim, N Xu, H Jin - arXiv preprint arXiv:2008.00544, 2020 - arxiv.org

This paper presents a new video question answering task on screencast tutorials. We
introduce a dataset including question, answer and context triples from the tutorial videos for …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

GTR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models

R Arora, N Narendranath, A Tambi… - arXiv preprint arXiv …, 2024 - arxiv.org

Consider the scenario where a human cleans a table and a robot observing the scene is
instructed with the task" Remove the cloth using which I wiped the table". Instruction …

Deep Neural Networks for Visual Reasoning

TM Le - arXiv preprint arXiv:2209.11990, 2022 - arxiv.org

Visual perception and language understanding are-fundamental components of human
intelligence, enabling them to understand and reason about objects and their interactions. It …

高级搜索

QQ 群