Hierarchical question-image co-attention for visual question answering

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

被引用次数：169 相关文章所有 8 个版本

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：144 相关文章所有 7 个版本

[PDF] springer.com

Multiscale feature extraction and fusion of image and text in VQA

S Lu, Y Ding, M Liu, Z Yin, L Yin, W Zheng - International Journal of …, 2023 - Springer

Abstract The Visual Question Answering (VQA) system is the process of finding useful
information from images related to the question to answer the question correctly. It can be …

被引用次数：164 相关文章所有 5 个版本

A review on the attention mechanism of deep learning

Z Niu, G Zhong, H Yu - Neurocomputing, 2021 - Elsevier

Attention has arguably become one of the most important concepts in the deep learning
field. It is inspired by the biological systems of humans that tend to focus on the distinctive …

被引用次数：1658 相关文章所有 4 个版本

[PDF] arxiv.org

A general survey on attention mechanisms in deep learning

G Brauwers, F Frasincar - IEEE Transactions on Knowledge …, 2021 - ieeexplore.ieee.org

Attention is an important mechanism that can be employed for a variety of deep learning
models across many different domains and tasks. This survey provides an overview of the …

被引用次数：224 相关文章所有 9 个版本

[HTML] sciencedirect.com

[HTML][HTML] A review of uncertainty quantification in deep learning: Techniques, applications and challenges

M Abdar, F Pourpanah, S Hussain, D Rezazadegan… - Information fusion, 2021 - Elsevier

Uncertainty quantification (UQ) methods play a pivotal role in reducing the impact of
uncertainties during both optimization and decision making processes. They have been …

被引用次数：1903 相关文章所有 12 个版本

[PDF] thecvf.com

Counterfactual attention learning for fine-grained visual categorization and re-identification

Y Rao, G Chen, J Lu, J Zhou - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Attention mechanism has demonstrated great potential in fine-grained visual recognition
tasks. In this paper, we present a counterfactual attention learning method to learn more …

被引用次数：241 相关文章所有 6 个版本

[PDF] thecvf.com

Multimodal co-attention transformer for survival prediction in gigapixel whole slide images

RJ Chen, MY Lu, WH Weng, TY Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com

Survival outcome prediction is a challenging weakly-supervised and ordinal regression task
in computational pathology that involves modeling complex interactions within the tumor …

被引用次数：184 相关文章所有 6 个版本

[PDF] thecvf.com

Seeing out of the box: End-to-end pre-training for vision-language representation learning

Z Huang, Z Zeng, Y Huang, B Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com

We study on joint learning of Convolutional Neural Network (CNN) and Transformer for
vision-language pre-training (VLPT) which aims to learn cross-modal alignments from …

被引用次数：268 相关文章所有 6 个版本

[PDF] neurips.cc

Multi-granularity cross-modal alignment for generalized medical visual representation learning

F Wang, Y Zhou, S Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Learning medical visual representations directly from paired radiology reports has become
an emerging topic in representation learning. However, existing medical image-text joint …

被引用次数：70 相关文章所有 7 个版本

高级搜索

QQ 群