Hierarchical conditional relation networks for video question answering

TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …

Dualvgr: A dual-visual graph reasoning unit for video question answering

J Wang, BK Bao, C Xu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org
Video question answering is a challenging task, which requires agents to be able to
understand rich video contents and perform spatial-temporal reasoning. However, existing …

Hair: Hierarchical visual-semantic relational reasoning for video question answering

F Liu, J Liu, W Wang, H Lu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Relational reasoning is at the heart of video question answering. However, existing
approaches suffer from several common limitations:(1) they only focus on either object-level …

Hierarchical conditional relation networks for multimodal video question answering

TM Le, V Le, S Venkatesh, T Tran - International Journal of Computer …, 2021 - Springer
Abstract Video Question Answering (Video QA) challenges modelers in multiple fronts.
Modeling video necessitates building not only spatio-temporal models for the dynamic visual …

3-D Relation Network for visual relation recognition in videos

Q Cao, H Huang, X Shang, B Wang, TS Chua - Neurocomputing, 2021 - Elsevier
Video visual relation recognition aims at mining the dynamic relation instances between
objects in the form of< subject, predicate, object>, such as “person1-towards-person2” and …

Object-centric representation learning for video question answering

LH Dang, TM Le, V Le, T Tran - 2021 International Joint …, 2021 - ieeexplore.ieee.org
Video question answering (Video QA) presents a powerful testbed for human-like intelligent
behaviors. The task demands new capabilities to integrate video processing, language …

TLNet: Temporal Span Localization Network With Collaborative Graph Reasoning for Video Question Answering

L Liang, G Sun, T Li, S Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Video question answering (VideoQA) has witnessed remarkable progress in the past few
years, but there are still challenges in precisely locating question-related segments and …

Video question answering on screencast tutorials

W Zhao, S Kim, N Xu, H Jin - arXiv preprint arXiv:2008.00544, 2020 - arxiv.org
This paper presents a new video question answering task on screencast tutorials. We
introduce a dataset including question, answer and context triples from the tutorial videos for …

GTR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models

R Arora, N Narendranath, A Tambi… - arXiv preprint arXiv …, 2024 - arxiv.org
Consider the scenario where a human cleans a table and a robot observing the scene is
instructed with the task" Remove the cloth using which I wiped the table". Instruction …

Deep Neural Networks for Visual Reasoning

TM Le - arXiv preprint arXiv:2209.11990, 2022 - arxiv.org
Visual perception and language understanding are-fundamental components of human
intelligence, enabling them to understand and reason about objects and their interactions. It …