Focal and composed vision-semantic modeling for visual question answering

Mukea: Multimodal knowledge extraction and accumulation for knowledge-based visual question answering

Y Ding, J Yu, B Liu, Y Hu, M Cui… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Abstract Knowledge-based visual question answering requires the ability of associating
external knowledge for open-ended cross-modal scene understanding. One limitation of …

被引用次数：94 相关文章所有 7 个版本

[PDF] acm.org

A unified end-to-end retriever-reader framework for knowledge-based vqa

Y Guo, L Nie, Y Wong, Y Liu, Z Cheng… - Proceedings of the 30th …, 2022 - dl.acm.org

Knowledge-based Visual Question Answering (VQA) expects models to rely on external
knowledge for robust answer prediction. Though significant it is, this paper discovers several …

被引用次数：28 相关文章所有 5 个版本

Resolving zero-shot and fact-based visual question answering via enhanced fact retrieval

S Wu, G Zhao, X Qian - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org

Practical applications with visual question answering (VQA) systems are challenging, and
recent research has aimed at investigating this important field. Many issues related to real …

被引用次数：5 相关文章所有 2 个版本

Semantic collaborative learning for cross-modal moment localization

Y Hu, K Wang, M Liu, H Tang, L Nie - ACM Transactions on Information …, 2023 - dl.acm.org

Localizing a desired moment within an untrimmed video via a given natural language query,
ie, cross-modal moment localization, has attracted widespread research attention recently …

被引用次数：2 相关文章

[PDF] aaai.org

HybridPrompt: bridging language models and human priors in prompt tuning for visual question answering

Z Ma, Z Yu, J Li, G Li - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

Abstract Visual Question Answering (VQA) aims to answer the natural language question
about a given image by understanding multimodal content. However, the answer quality of …

被引用次数：5 相关文章所有 2 个版本

[PDF] aaai.org

Exploiting the Social-Like Prior in Transformer for Visual Reasoning

Y Han, Y Hu, X Song, H Tang, M Xu… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Benefiting from instrumental global dependency modeling of self-attention (SA), transformer-
based approaches have become the pivotal choices for numerous downstream visual …

被引用次数：2 相关文章

[PDF] mdpi.com

Scenegate: Scene-graph based co-attention networks for text visual question answering

F Cao, S Luo, F Nunez, Z Wen, J Poon, SC Han - Robotics, 2023 - mdpi.com

Visual Question Answering (VQA) models fail catastrophically on questions related to the
reading of text-carrying images. However, TextVQA aims to answer questions by …

被引用次数：6 相关文章所有 7 个版本

Multimodal Bi-direction guided attention networks for visual question answering

L Cai, N Xu, H Tian, K Chen, H Fan - Neural Processing Letters, 2023 - Springer

Current visual question answering (VQA) has become a research hotspot in the computer
vision and natural language processing field. A core solution of VQA is how to fuse multi …

被引用次数：1 相关文章所有 2 个版本

Bridging the Cross-Modality Semantic Gap in Visual Question Answering

B Wang, Y Ma, X Li, J Gao, Y Hu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The objective of visual question answering (VQA) is to adequately comprehend a question
and identify relevant contents in an image that can provide an answer. Existing approaches …

Boosting Visual Question Answering Through Geometric Perception and Region Features

H Yu, Z Wang, Y Liu, H Liu - ECAI 2023, 2023 - ebooks.iospress.nl

Visual question answering (VQA) is a crucial yet challenging task in multimodal
understanding. To correctly answer questions about an image, VQA models are required to …

高级搜索

QQ 群