The multi-modal fusion in visual question answering: a review of attention mechanisms

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

Generative adversarial framework for cold-start item recommendation

H Chen, Z Wang, F Huang, X Huang, Y Xu… - Proceedings of the 45th …, 2022 - dl.acm.org
The cold-start problem has been a long-standing issue in recommendation. Embedding-
based recommendation models provide recommendations by learning embeddings for each …

Graph neural networks in vision-language image understanding: A survey

H Senior, G Slabaugh, S Yuan, L Rossi - The Visual Computer, 2024 - Springer
Abstract 2D image understanding is a complex problem within computer vision, but it holds
the key to providing human-level scene comprehension. It goes further than identifying the …

[HTML][HTML] Multi-modal information analysis for fault diagnosis with time-series data from power transformer

Z Xing, Y He - International Journal of Electrical Power & Energy …, 2023 - Elsevier
Fault diagnosis is important to the timely repair of the power transformer. However, machine
learning has not been exploited effectively for fault diagnosis due to the limitation of multi …

Dual self-attention with co-attention networks for visual question answering

Y Liu, X Zhang, Q Zhang, C Li, F Huang, X Tang, Z Li - Pattern Recognition, 2021 - Elsevier
Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …

ALSA: adversarial learning of supervised attentions for visual question answering

Y Liu, X Zhang, Z Zhao, B Zhang… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Visual question answering (VQA) has gained increasing attention in both natural language
processing and computer vision. The attention mechanism plays a crucial role in relating the …

Inverse adversarial diversity learning for network ensemble

S Zhou, J Wang, L Wang, X Wan, S Hui… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Network ensemble aims to obtain better results by aggregating the predictions of multiple
weak networks, in which how to keep the diversity of different networks plays a critical role in …

A multi-level mesh mutual attention model for visual question answering

Z Lei, G Zhang, L Wu, K Zhang, R Liang - Data Science and Engineering, 2022 - Springer
Visual question answering is a complex multimodal task involving images and text, with
broad application prospects in human–computer interaction and medical assistance …

Counting-based visual question answering with serial cascaded attention deep learning

T MeshuWelde, L Liao - Pattern Recognition, 2023 - Elsevier
The counting-based questions play a major part in Visual Question Answering (VQA), the
most challenging factor is counting the different objects present in the images. Recently …

Dynamic self-attention with vision synchronization networks for video question answering

Y Liu, X Zhang, F Huang, S Shen, P Tian, L Li, Z Li - Pattern Recognition, 2022 - Elsevier
Abstract Video Question Answering (VideoQA) has gained increasing attention as an
important task in understanding the rich spatio-temporal contents, ie, the appearance and …