Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Counterfactual critic multi-agent training for scene graph generation

L Chen, H Zhang, J Xiao, X He… - Proceedings of the …, 2019 - openaccess.thecvf.com
Scene graphs---objects as nodes and visual relationships as edges---describe the
whereabouts and interactions of objects in an image for comprehensive scene …

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …

Re-attention for visual question answering

W Guo, Y Zhang, J Yang, X Yuan - IEEE Transactions on Image …, 2021 - ieeexplore.ieee.org
A simultaneous understanding of questions and images is crucial in Visual Question
Answering (VQA). While the existing models have achieved satisfactory performance by …

Visual relationship detection: A survey

J Cheng, L Wang, J Wu, X Hu, G Jeon… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Visual relationship detection (VRD) is one newly developed computer vision task, aiming to
recognize relations or interactions between objects in an image. It is a further learning task …

CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering

S Wang, L Zhang, L Zhu, T Qin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Diagram Question Answering (DQA) is a challenging task requiring models to
answer natural language questions based on visual diagram contexts. It serves as a crucial …

A human-like traffic scene understanding system: A survey

ZX Xia, WC Lai, LW Tsao, LF Hsu… - IEEE Industrial …, 2020 - ieeexplore.ieee.org
Autonomous vehicles, also known as self-driving cars, have the capability to perceive the
environment, locate its position, and safely drive to the destination without any human …

Scenegate: Scene-graph based co-attention networks for text visual question answering

F Cao, S Luo, F Nunez, Z Wen, J Poon, SC Han - Robotics, 2023 - mdpi.com
Visual Question Answering (VQA) models fail catastrophically on questions related to the
reading of text-carrying images. However, TextVQA aims to answer questions by …

DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering

Y Wang, B Wei, J Liu, L Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Diagram Question Answering (DQA) aims to correctly answer questions about given
diagrams, which demands an interplay of good diagram understanding and effective …

Alignment-Guided Self-Supervised Learning for Diagram Question Answering

S Wang, L Zhang, W Wu, T Qin… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Diagram question answering (DQA), which is defined as answering natural language
questions according to the visual diagram context, has attracted attention and has recently …