Answer questions with right image regions: A visual attention regularization approach

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

被引用次数：179 相关文章所有 8 个版本

Positional attention guided transformer-like architecture for visual question answering

A Mao, Z Yang, K Lin, J Xuan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Transformer architectures have recently been introduced into the field of visual question
answering (VQA), due to their powerful capabilities of information extraction and fusion …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Contrastive region guidance: Improving grounding in vision-language models without training

D Wan, J Cho, E Stengel-Eskin, M Bansal - arXiv preprint arXiv …, 2024 - arxiv.org

Highlighting particularly relevant regions of an image can improve the performance of vision-
language models (VLMs) on various vision-language (VL) tasks by guiding the model to …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Data efficient masked language modeling for vision and language

Y Bitton, G Stanovsky, M Elhadad… - arXiv preprint arXiv …, 2021 - arxiv.org

Masked language modeling (MLM) is one of the key sub-tasks in vision-language
pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and …

被引用次数：23 相关文章所有 6 个版本

[PDF] neurips.cc

Visfis: Visual feature importance supervision with right-for-the-right-reason objectives

Z Ying, P Hase, M Bansal - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Many past works aim to improve visual reasoning in models by supervising feature
importance (estimated by model explanation techniques) with human annotations such as …

被引用次数：12 相关文章所有 6 个版本

[PDF] arxiv.org

Robust visual question answering: Datasets, methods, and future challenges

J Ma, P Wang, D Kong, Z Wang, J Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Visual question answering requires a system to provide an accurate natural language
answer given an image and a natural language question. However, it is widely recognized …

被引用次数：3 相关文章所有 6 个版本

[PDF] thecvf.com

Guiding visual question answering with attention priors

TM Le, V Le, S Gupta… - Proceedings of the …, 2023 - openaccess.thecvf.com

The current success of modern visual reasoning systems is arguably attributed to cross-
modality attention mechanisms. However, in deliberative reasoning such as in VQA …

被引用次数：7 相关文章所有 7 个版本

Co-attention graph convolutional network for visual question answering

C Liu, YY Tan, TT Xia, J Zhang, M Zhu - Multimedia Systems, 2023 - Springer

Abstract Visual Question Answering (VQA) is a challenging task that requires a fine-grained
understanding of both the visual content of images and the textual content of questions …

被引用次数：6 相关文章所有 2 个版本

Cross-modality multiple relations learning for knowledge-based visual question answering

Y Wang, P Li, Q Si, H Zhang, W Zang, Z Lin… - ACM Transactions on …, 2023 - dl.acm.org

Knowledge-based visual question answering not only needs to answer the questions based
on images but also incorporates external knowledge to study reasoning in the joint space of …

被引用次数：3 相关文章

[PDF] arxiv.org

Visual question answering: A survey on techniques and common trends in recent literature

ACAM de Faria, FC Bastos, JVNA da Silva… - arXiv preprint arXiv …, 2023 - arxiv.org

Visual Question Answering (VQA) is an emerging area of interest for researches, being a
recent problem in natural language processing and image prediction. In this area, an …

被引用次数：3 相关文章

高级搜索

QQ 群