Alip: Adaptive language-image pre-training with synthetic caption

K Yang, J Deng, X An, J Li, Z Feng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) has significantly boosted the
performance of various vision-language tasks by scaling up the dataset with image-text pairs …

Target before shooting: Accurate anomaly detection and localization under one millisecond via cascade patch retrieval

H Li, J Hu, B Li, H Chen, Y Zheng, C Shen - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, by re-examining the" matching" nature of Anomaly Detection (AD), we propose
a new AD framework that simultaneously enjoys new records of AD accuracy and …

Consistent penalizing field loss for zero-shot image retrieval

C Liu, W She, M Chen, X Li, SX Yang - Expert Systems with Applications, 2024 - Elsevier
Zero-shot image retrieval involves retrieving images of unseen classes using a query image
of the same class. To determine whether a given image is of the same class as the query …

Data-Efficient Multimodal Fusion on a Single GPU

N Vouitsis, Z Liu, SK Gorti… - Proceedings of the …, 2024 - openaccess.thecvf.com
The goal of multimodal alignment is to learn a single latent space that is shared between
multimodal inputs. The most powerful models in this space have been trained using massive …

[PDF][PDF] Rethinking Self-supervised Learning for Cross-domain Adversarial Sample Recovery

Y Li, P Angelov, N Suri - … of the International Joint Conference on …, 2024 - ssg.lancs.ac.uk
Adversarial attacks can cause misclassification in machine learning pipelines, posing a
significant safety risk in critical applications such as autonomous systems or medical …

Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

H Huang, Z Nie, Z Wang, Z Shang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Current image-text retrieval methods have demonstrated impressive performance in recent
years. However, they still face two problems: the inter-modal matching missing problem and …

Hugs Bring Double Benefits: Unsupervised Cross-Modal Hashing with Multi-granularity Aligned Transformers

J Wang, Z Zeng, B Chen, Y Wang, D Liao, G Li… - International Journal of …, 2024 - Springer
Unsupervised cross-modal hashing (UCMH) has been commonly explored to support large-
scale cross-modal retrieval of unlabeled data. Despite promising progress, most existing …

Multimodal Pathology Image Search Between H&E Slides and Multiplexed Immunofluorescent Images

A Hajighasemi, MD Saurav, MS Nasr, JP Veerla… - arXiv preprint arXiv …, 2023 - arxiv.org
We present an approach for multimodal pathology image search, using dynamic time
warping (DTW) on Variational Autoencoder (VAE) latent space that is fed into a ranked …

[HTML][HTML] Efficient Image Retrieval Using Hierarchical K-Means Clustering

D Park, Y Hwang - Sensors, 2024 - mdpi.com
The objective of content-based image retrieval (CBIR) is to locate samples from a database
that are akin to a query, relying on the content embedded within the images. A contemporary …

Self-Supervised Representation Learning for Adversarial Attack Detection

Y Li, P Angelov, N Suri - arXiv preprint arXiv:2407.04382, 2024 - arxiv.org
Supervised learning-based adversarial attack detection methods rely on a large number of
labeled data and suffer significant performance degradation when applying the trained …