[HTML][HTML] The multi-modal fusion in visual question answering: a review of attention mechanisms

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

Image-text retrieval: A survey on recent research and development

M Cao, S Li, J Li, L Nie, M Zhang - arXiv preprint arXiv:2203.14713, 2022 - arxiv.org
In the past few years, cross-modal image-text retrieval (ITR) has experienced increased
interest in the research community due to its excellent research value and broad real-world …

Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval

D Jiang, M Ye - Proceedings of the IEEE/CVF Conference …, 2023 - openaccess.thecvf.com
Text-to-image person retrieval aims to identify the target person based on a given textual
description query. The primary challenge is to learn the mapping of visual and textual …

Self-supervised learning: Generative or contrastive

X Liu, F Zhang, Z Hou, L Mian, Z Wang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Deep supervised learning has achieved great success in the last decade. However, its
defects of heavy dependence on manual labels and vulnerability to attacks have driven …

Dual-level representation enhancement on characteristic and context for image-text retrieval

S Yang, Q Li, W Li, X Li, AA Liu - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
Image-text retrieval is a fundamental and vital task in multi-media retrieval and has received
growing attention since it connects heterogeneous data. Previous methods that perform well …

Clip-driven fine-grained text-image person re-identification

S Yan, N Dong, L Zhang, J Tang - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
Text-Image Person Re-identification (TIReID) aims to retrieve the image corresponding to
the given text query from a pool of candidate images. Existing methods employ prior …

Cross-modality person re-identification with shared-specific feature transfer

Y Lu, Y Wu, B Liu, T Zhang, B Li… - Proceedings of the …, 2020 - openaccess.thecvf.com
Cross-modality person re-identification (cm-ReID) is a challenging but key technology for
intelligent video analysis. Existing works mainly focus on learning modality-shared …

Fashionvlp: Vision language transformer for fashion retrieval with feedback

S Goenka, Z Zheng, A Jaiswal… - Proceedings of the …, 2022 - openaccess.thecvf.com
Fashion image retrieval based on a query pair of reference image and natural language
feedback is a challenging task that requires models to assess fashion related information …

See finer, see more: Implicit modality alignment for text-based person retrieval

X Shu, W Wen, H Wu, K Chen, Y Song, R Qiao… - … on Computer Vision, 2022 - Springer
Text-based person retrieval aims to find the query person based on a textual description.
The key is to learn a common latent space mapping between visual-textual modalities. To …

Learning granularity-unified representations for text-to-image person re-identification

Z Shao, X Zhang, M Fang, Z Lin, J Wang… - Proceedings of the 30th …, 2022 - dl.acm.org
Text-to-image person re-identification (ReID) aims to search for pedestrian images of an
interested identity via textual descriptions. It is challenging due to both rich intra-modal …