Learning semantic relationship among instances for image-text matching

Z Fu, Z Mao, Y Song, Y Zhang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image-text matching, a bridge connecting image and language, is an important task, which
generally learns a holistic cross-modal embedding to achieve a high-quality semantic …

Cross-modal active complementary learning with self-refining correspondence

Y Qin, Y Sun, D Peng, JT Zhou… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recently, image-text matching has attracted more and more attention from academia and
industry, which is fundamental to understanding the latent correspondence across visual …

Your negative may not be true negative: Boosting image-text matching with false negative elimination

H Li, Y Bin, J Liao, Y Yang, HT Shen - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Most existing image-text matching methods adopt triplet loss as the optimization objective,
and choosing a proper negative sample for the triplet of< anchor, positive, negative> is …

Composing object relations and attributes for image-text matching

K Pham, C Huynh, SN Lim… - Proceedings of the …, 2024 - openaccess.thecvf.com
We study the visual semantic embedding problem for image-text matching. Most existing
work utilizes a tailored cross-attention mechanism to perform local alignment across the two …

Gssf: Generalized structural sparse function for deep cross-modal metric learning

H Diao, Y Zhang, S Gao, J Zhu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Cross-modal metric learning is a prominent research topic that bridges the semantic
heterogeneity between vision and language. Existing methods frequently utilize simple …

Towards deconfounded image-text matching with causal inference

W Li, X Su, D Song, L Wang, K Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org
Prior image-text matching methods have shown remarkable performance on many
benchmark datasets, but most of them overlook the bias in the dataset, which exists in intra …

Enhanced semantic similarity learning framework for image-text matching

K Zhang, B Hu, H Zhang, Z Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Image-text matching is a fundamental task to bridge vision and language. The critical
challenge lies in accurately learning the semantic similarity between these two …

3SHNet: Boosting image–sentence retrieval via visual semantic–spatial self-highlighting

X Ge, S Xu, F Chen, J Wang, G Wang, S An… - Information Processing & …, 2024 - Elsevier
In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed
3SHNet) for high-precision, high-efficiency and high-generalization image–sentence …

Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching

K Zhang, L Zhang, B Hu, M Zhu, Z Mao - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Image-text matching, as a fundamental cross-modal task, bridges vision and language. The
key challenge lies in accurately learning the semantic similarity of these two heterogeneous …

PointCloud-Text Matching: Benchmark Datasets and a Baseline

Y Feng, Y Qin, D Peng, H Zhu, X Peng, P Hu - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we present and study a new instance-level retrieval task: PointCloud-Text
Matching~(PTM), which aims to find the exact cross-modal instance that matches a given …