Exploiting fine-grained correspondence and visual-semantic alignments has shown great potential in image-text matching. Generally, recent approaches first employ a cross-modal …
K Zhang, B Hu, H Zhang, Z Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Image-text matching is a fundamental task to bridge vision and language. The critical challenge lies in accurately learning the semantic similarity between these two …
Q Ma, J Pan, C Bai - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org
Image-text retrieval has developed rapidly in recent years. However, it is still a challenge in remote sensing due to visual-semantic imbalance, which leads to incorrect matching of …
In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image–sentence …
Although the attention mechanism in transformers has proven successful in image-text retrieval tasks, most transformer models suffer from a large number of parameters. Inspired …
K Zhang, L Zhang, B Hu, M Zhu, Z Mao - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Image-text matching, as a fundamental cross-modal task, bridges vision and language. The key challenge lies in accurately learning the semantic similarity of these two heterogeneous …
T Yao, Y Li, Y Li, Y Zhu, G Wang, J Yue - ACM Transactions on …, 2023 - dl.acm.org
Image-text matching plays an important role in solving the problem of cross-modal information processing. Since there are nonnegligible semantic differences between …
J Li, L Niu, L Zhang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract Recently, Video Question-Answering (VideoQA) has drawn more and more attention from both industry and research community. Despite all the success achieved by …
H Diao, Y Zhang, S Gao, X Ruan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous …