AA Yusuf, F Chong, M Xianling - Artificial Intelligence Review, 2022 - Springer
Graph neural network is a deep learning approach widely applied on structural and non- structural scenarios due to its substantial performance and interpretability recently. In a non …
M Mathew, D Karatzas… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images …
We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to …
Abstract The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a …
G Li, X Wang, W Zhu - Proceedings of the 28th ACM International …, 2020 - dl.acm.org
Given an image and a natural language question, Visual Question Answering (VQA) aims at answering the textual question correctly. Most VQA approaches in literature targets at finding …
J Wang, H Zhang, H Hong, X Jin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Existing open vocabulary object detection (OVD) works expand the object detector toward open categories by replacing the classifier with the category text embeddings and optimizing …
Given a high-level instruction, the task of Embodied Referring Expression (REVERIE) requires an embodied agent to localise a remote referred object via navigating in the …
G Zeng, Y Zhang, Y Zhou, X Yang - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Text-based visual question answering (TextVQA) requires analyzing both the visual contents and texts in an image to answer a question, which is more practical than general visual …
A Urooj, H Kuehne, B Wu, K Chheu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Answering questions about complex situations in videos requires not only capturing of the presence of actors, objects, and their relations, but also the evolution of these relationships …