On the explainability of natural language processing deep models

JE Zini, M Awad - ACM Computing Surveys, 2022 - dl.acm.org
Despite their success, deep networks are used as black-box models with outputs that are not
easily explainable during the learning and the prediction phases. This lack of interpretability …

Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Just ask: Learning to answer questions from millions of narrated videos

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent methods for visual question answering rely on large-scale annotated datasets.
Manual annotation of questions and answers for videos, however, is tedious, expensive and …

Cross attention network for few-shot classification

R Hou, H Chang, B Ma, S Shan… - Advances in neural …, 2019 - proceedings.neurips.cc
Few-shot classification aims to recognize unlabeled samples from unseen classes given
only few labeled samples. The unseen classes and low-data problem make few-shot …

Ok-vqa: A visual question answering benchmark requiring external knowledge

K Marino, M Rastegari, A Farhadi… - Proceedings of the …, 2019 - openaccess.thecvf.com
Abstract Visual Question Answering (VQA) in its ideal form lets us study reasoning in the
joint space of vision and language and serves as a proxy for the AI task of scene …

Pyramid feature attention network for saliency detection

T Zhao, X Wu - Proceedings of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
Saliency detection is one of the basic challenges in computer vision. Recently, CNNs are the
most widely used and powerful techniques for saliency detection, in which feature maps …

Towards vqa models that can read

A Singh, V Natarajan, M Shah… - Proceedings of the …, 2019 - openaccess.thecvf.com
Studies have shown that a dominant class of questions asked by visually impaired users on
images of their surroundings involves reading text in the image. But today's VQA models can …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Mirrorgan: Learning text-to-image generation by redescription

T Qiao, J Zhang, D Xu, D Tao - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Generating an image from a given text description has two goals: visual realism and
semantic consistency. Although significant progress has been made in generating high …