A recipe for scaling up text-to-video generation with text-free videos

X Wang, S Zhang, H Yuan, Z Qing… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion-based text-to-video generation has witnessed impressive progress in the past year
yet still falls behind text-to-image generation. One of the key reasons is the limited scale of …

Plug-and-play regulators for image-text matching

H Diao, Y Zhang, W Liu, X Ruan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Exploiting fine-grained correspondence and visual-semantic alignments has shown great
potential in image-text matching. Generally, recent approaches first employ a cross-modal …

Enhanced semantic similarity learning framework for image-text matching

K Zhang, B Hu, H Zhang, Z Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Image-text matching is a fundamental task to bridge vision and language. The critical
challenge lies in accurately learning the semantic similarity between these two …

Direction-oriented visual-semantic embedding model for remote sensing image-text retrieval

Q Ma, J Pan, C Bai - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org
Image-text retrieval has developed rapidly in recent years. However, it is still a challenge in
remote sensing due to visual-semantic imbalance, which leads to incorrect matching of …

3SHNet: Boosting image–sentence retrieval via visual semantic–spatial self-highlighting

X Ge, S Xu, F Chen, J Wang, G Wang, S An… - Information Processing & …, 2024 - Elsevier
In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed
3SHNet) for high-precision, high-efficiency and high-generalization image–sentence …

Reservoir computing transformer for image-text retrieval

W Li, Z Ma, LJ Deng, P Wang, J Shi, X Fan - Proceedings of the 31st …, 2023 - dl.acm.org
Although the attention mechanism in transformers has proven successful in image-text
retrieval tasks, most transformer models suffer from a large number of parameters. Inspired …

Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching

K Zhang, L Zhang, B Hu, M Zhu, Z Mao - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Image-text matching, as a fundamental cross-modal task, bridges vision and language. The
key challenge lies in accurately learning the semantic similarity of these two heterogeneous …

Cross-Modal Semantically Augmented Network for Image-Text Matching

T Yao, Y Li, Y Li, Y Zhu, G Wang, J Yue - ACM Transactions on …, 2023 - dl.acm.org
Image-text matching plays an important role in solving the problem of cross-modal
information processing. Since there are nonnegligible semantic differences between …

Knowledge Proxy Intervention for Deconfounded Video Question Answering

J Li, L Niu, L Zhang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract Recently, Video Question-Answering (VideoQA) has drawn more and more
attention from both industry and research community. Despite all the success achieved by …

Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

H Diao, Y Zhang, S Gao, X Ruan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Image-text matching remains a challenging task due to heterogeneous semantic diversity
across modalities and insufficient distance separability within triplets. Different from previous …