Enhanced semantic similarity learning framework for image-text matching

K Zhang, B Hu, H Zhang, Z Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Image-text matching is a fundamental task to bridge vision and language. The critical
challenge lies in accurately learning the semantic similarity between these two …

Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching

K Zhang, L Zhang, B Hu, M Zhu, Z Mao - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Image-text matching, as a fundamental cross-modal task, bridges vision and language. The
key challenge lies in accurately learning the semantic similarity of these two heterogeneous …

CAPro: webly supervised learning with cross-modality aligned prototypes

Y Qin, X Chen, Y Shen, C Fu, Y Gu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Webly supervised learning has attracted increasing attention for its effectiveness in
exploring publicly accessible data at scale without manual annotation. However, most …

MiC: Image-text Matching in Circles with cross-modal generative knowledge enhancement

X Pu, Y Chen, L Yuan, Y Zhang, H Li, L Jing… - Knowledge-Based …, 2024 - Elsevier
Image-text matching is a challenging task due to vast discrepancies between the visual and
textual modalities. Existing solutions tend to focus on a limited set of strongly aligned or …

Integrating listwise ranking into pairwise-based image-text retrieval

Z Li, C Guo, X Wang, H Zhang, Y Wang - Knowledge-Based Systems, 2024 - Elsevier
Abstract Image-Text Retrieval (ITR) is essentially a ranking problem. Given a query caption,
the goal is to rank candidate images by relevance, from large to small. The current ITR …

Reply with Sticker: New Dataset and Model for Sticker Retrieval

B Liang, B Wang, Z Bai, Q Lang, M Sun, K Hou… - arXiv preprint arXiv …, 2024 - arxiv.org
Using stickers in online chatting is very prevalent on social media platforms, where the
stickers used in the conversation can express someone's intention/emotion/attitude in a …

Fast, Accurate, and Lightweight Memory-Enhanced Embedding Learning Framework for Image-Text Retrieval

Z Li, L Zhang, K Zhang, Y Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Image-text retrieval is a fundamental task in bridging the semantics between vision and
language. The key challenge lies in accurately and efficiently learning the semantic …

Improving Image-Text Matching with Bidirectional Consistency of Cross-Modal Alignment

Z Li, L Zhang, K Zhang, Y Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Image-text matching is a fundamental task in bridging the semantics between vision and
language. The key challenge lies in establishing accurate alignment between two …

An empirical study of excitation and aggregation design adaptions in CLIP4Clip for video–text retrieval

X Jing, G Yang, J Chu - Neurocomputing, 2024 - Elsevier
CLIP4Clip model transferred from the CLIP has been the de-factor standard to solve the
video clip retrieval task from frame-level input, triggering the surge of CLIP4Clip-based …

Team HUGE: Image-Text Matching via Hierarchical and Unified Graph Enhancing

B Li, Y Wu, Z Li - Proceedings of the 2024 International Conference on …, 2024 - dl.acm.org
Graph structures can represent rich semantic relationships, but currently, image-text
matching methods have not been well applied. How to efficiently achieve graph learning …