Adversarial learning with multi-modal attention for visual question answering

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

被引用次数：211 相关文章所有 8 个版本

[PDF] archive.org

Generative adversarial framework for cold-start item recommendation

H Chen, Z Wang, F Huang, X Huang, Y Xu… - Proceedings of the 45th …, 2022 - dl.acm.org

The cold-start problem has been a long-standing issue in recommendation. Embedding-
based recommendation models provide recommendations by learning embeddings for each …

被引用次数：61 相关文章所有 3 个版本

[PDF] springer.com

Graph neural networks in vision-language image understanding: A survey

H Senior, G Slabaugh, S Yuan, L Rossi - The Visual Computer, 2024 - Springer

Abstract 2D image understanding is a complex problem within computer vision, but it holds
the key to providing human-level scene comprehension. It goes further than identifying the …

被引用次数：16 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Multi-modal information analysis for fault diagnosis with time-series data from power transformer

Z Xing, Y He - International Journal of Electrical Power & Energy …, 2023 - Elsevier

Fault diagnosis is important to the timely repair of the power transformer. However, machine
learning has not been exploited effectively for fault diagnosis due to the limitation of multi …

被引用次数：33 相关文章所有 2 个版本

Dual self-attention with co-attention networks for visual question answering

Y Liu, X Zhang, Q Zhang, C Li, F Huang, X Tang, Z Li - Pattern Recognition, 2021 - Elsevier

Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …

被引用次数：62 相关文章所有 2 个版本

ALSA: adversarial learning of supervised attentions for visual question answering

Y Liu, X Zhang, Z Zhao, B Zhang… - IEEE transactions on …, 2020 - ieeexplore.ieee.org

Visual question answering (VQA) has gained increasing attention in both natural language
processing and computer vision. The attention mechanism plays a crucial role in relating the …

被引用次数：31 相关文章所有 3 个版本

[PDF] google.com

Inverse adversarial diversity learning for network ensemble

S Zhou, J Wang, L Wang, X Wan, S Hui… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Network ensemble aims to obtain better results by aggregating the predictions of multiple
weak networks, in which how to keep the diversity of different networks plays a critical role in …

被引用次数：6 相关文章所有 4 个版本

[PDF] springer.com

A multi-level mesh mutual attention model for visual question answering

Z Lei, G Zhang, L Wu, K Zhang, R Liang - Data Science and Engineering, 2022 - Springer

Visual question answering is a complex multimodal task involving images and text, with
broad application prospects in human–computer interaction and medical assistance …

被引用次数：16 相关文章所有 3 个版本

Counting-based visual question answering with serial cascaded attention deep learning

T MeshuWelde, L Liao - Pattern Recognition, 2023 - Elsevier

The counting-based questions play a major part in Visual Question Answering (VQA), the
most challenging factor is counting the different objects present in the images. Recently …

被引用次数：6 相关文章所有 3 个版本

Dynamic self-attention with vision synchronization networks for video question answering

Y Liu, X Zhang, F Huang, S Shen, P Tian, L Li, Z Li - Pattern Recognition, 2022 - Elsevier

Abstract Video Question Answering (VideoQA) has gained increasing attention as an
important task in understanding the rich spatio-temporal contents, ie, the appearance and …

被引用次数：8 相关文章所有 3 个版本

高级搜索

QQ 群