The multi-modal fusion in visual question answering: a review of attention mechanisms

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

Explaining explanations: An overview of interpretability of machine learning

LH Gilpin, D Bau, BZ Yuan, A Bajwa… - 2018 IEEE 5th …, 2018 - ieeexplore.ieee.org
There has recently been a surge of work in explanatory artificial intelligence (XAI). This
research area tackles the important problem that complex machines and algorithms often …

[HTML][HTML] Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence

S Ali, T Abuhmed, S El-Sappagh, K Muhammad… - Information fusion, 2023 - Elsevier
Artificial intelligence (AI) is currently being utilized in a wide range of sophisticated
applications, but the outcomes of many AI models are challenging to comprehend and trust …

A general survey on attention mechanisms in deep learning

G Brauwers, F Frasincar - IEEE Transactions on Knowledge …, 2021 - ieeexplore.ieee.org
Attention is an important mechanism that can be employed for a variety of deep learning
models across many different domains and tasks. This survey provides an overview of the …

Gqa: A new dataset for real-world visual reasoning and compositional question answering

DA Hudson, CD Manning - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …

Unmasking Clever Hans predictors and assessing what machines really learn

S Lapuschkin, S Wäldchen, A Binder… - Nature …, 2019 - nature.com
Current learning machines have successfully solved hard application problems, reaching
high accuracy and displaying seemingly intelligent behavior. Here we apply recent …

Counterfactual vqa: A cause-effect look at language bias

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …

Salient object detection in the deep learning era: An in-depth survey

W Wang, Q Lai, H Fu, J Shen, H Ling… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
As an essential problem in computer vision, salient object detection (SOD) has attracted an
increasing amount of research attention over the years. Recent advances in SOD are …

A multidisciplinary survey and framework for design and evaluation of explainable AI systems

S Mohseni, N Zarei, ED Ragan - ACM Transactions on Interactive …, 2021 - dl.acm.org
The need for interpretable and accountable intelligent systems grows along with the
prevalence of artificial intelligence (AI) applications used in everyday life. Explainable AI …

Cascade r-cnn: Delving into high quality object detection

Z Cai, N Vasconcelos - … of the IEEE conference on computer …, 2018 - openaccess.thecvf.com
In object detection, an intersection over union (IoU) threshold is required to define positives
and negatives. An object detector, trained with low IoU threshold, eg 0.5, usually produces …