Ask, attend and answer: Exploring question-guided spatial attention for visual question answering

JE Zini, M Awad - ACM Computing Surveys, 2022 - dl.acm.org

Despite their success, deep networks are used as black-box models with outputs that are not
easily explainable during the learning and the prediction phases. This lack of interpretability …

被引用次数：72 相关文章所有 6 个版本

[PDF] arxiv.org

Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer

A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

被引用次数：92 相关文章所有 8 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3367 相关文章所有 2 个版本

[PDF] thecvf.com

Just ask: Learning to answer questions from millions of narrated videos

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent methods for visual question answering rely on large-scale annotated datasets.
Manual annotation of questions and answers for videos, however, is tedious, expensive and …

被引用次数：266 相关文章所有 14 个版本

[PDF] neurips.cc

Cross attention network for few-shot classification

R Hou, H Chang, B Ma, S Shan… - Advances in neural …, 2019 - proceedings.neurips.cc

Few-shot classification aims to recognize unlabeled samples from unseen classes given
only few labeled samples. The unseen classes and low-data problem make few-shot …

被引用次数：705 相关文章所有 12 个版本

[PDF] thecvf.com

Ok-vqa: A visual question answering benchmark requiring external knowledge

K Marino, M Rastegari, A Farhadi… - Proceedings of the …, 2019 - openaccess.thecvf.com

Abstract Visual Question Answering (VQA) in its ideal form lets us study reasoning in the
joint space of vision and language and serves as a proxy for the AI task of scene …

被引用次数：765 相关文章所有 8 个版本

[PDF] thecvf.com

Pyramid feature attention network for saliency detection

T Zhao, X Wu - Proceedings of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com

Saliency detection is one of the basic challenges in computer vision. Recently, CNNs are the
most widely used and powerful techniques for saliency detection, in which feature maps …

被引用次数：787 相关文章所有 8 个版本

[PDF] thecvf.com

Towards vqa models that can read

A Singh, V Natarajan, M Shah… - Proceedings of the …, 2019 - openaccess.thecvf.com

Studies have shown that a dominant class of questions asked by visually impaired users on
images of their surroundings involves reading text in the image. But today's VQA models can …

被引用次数：690 相关文章所有 8 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：176 相关文章所有 8 个版本

[PDF] thecvf.com

Mirrorgan: Learning text-to-image generation by redescription

T Qiao, J Zhang, D Xu, D Tao - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Generating an image from a given text description has two goals: visual realism and
semantic consistency. Although significant progress has been made in generating high …

被引用次数：632 相关文章所有 9 个版本

高级搜索

QQ 群