15m multimodal facial image-text dataset

D Dai, YT Li, YG Liu, M Jia, Z YuanHui… - arXiv preprint arXiv …, 2024 - arxiv.org
Currently, image-text-driven multi-modal deep learning models have demonstrated their
outstanding potential in many fields. In practice, tasks centered around facial images have …

[PDF][PDF] Integrating vision-language semantic graphs in multi-view clustering

J Ke, Z Wen, Y Yang, C Cui, Y Ren, X Pu… - Proceedings of the Thirty …, 2024 - ijcai.org
In recent years, a variety of graph learningbased multi-view clustering (MVC) methods have
emerged. However, these methods continue to face challenges in extracting latent features …

LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation

P Yin, J Wang, G Zeng, D Xie, J Zhu - European Conference on Computer …, 2024 - Springer
The ability of gaze estimation models to generalize is often significantly hindered by various
factors unrelated to gaze, especially when the training dataset is limited. Current strategies …

CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model

P Yin, G Zeng, J Wang, D Xie - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Gaze estimation methods often experience significant performance degradation when
evaluated across different domains, due to the domain gap between the testing and training …

Quantum visual feature encoding revisited

XB Nguyen, HQ Nguyen, H Churchill, SU Khan… - Quantum Machine …, 2024 - Springer
Although quantum machine learning has been introduced for a while, its applications in
computer vision are still limited. This paper, therefore, revisits the quantum visual encoding …

Leveraging CLIP for Inferring Sensitive Information and Improving Model Fairness

M Zhang, R Chunara - arXiv preprint arXiv:2403.10624, 2024 - arxiv.org
Performance disparities across sub-populations are known to exist in deep learning-based
vision recognition models, but previous work has largely addressed such fairness concerns …

Unlocking Visual Secrets: Inverting Features with Diffusion Priors for Image Reconstruction

SQ Zhang, Z Li, C Guo, S Mahloujifar… - arXiv preprint arXiv …, 2024 - arxiv.org
Inverting visual representations within deep neural networks (DNNs) presents a challenging
and important problem in the field of security and privacy for deep learning. The main goal is …

QClusformer: A Quantum Transformer-based Framework for Unsupervised Visual Clustering

XB Nguyen, HQ Nguyen, SYC Chen, SU Khan… - arXiv preprint arXiv …, 2024 - arxiv.org
Unsupervised vision clustering, a cornerstone in computer vision, has been studied for
decades, yielding significant outcomes across numerous vision tasks. However, these …

CLIP Unreasonable Potential in Single-Shot Face Recognition

NT Luu - arXiv preprint arXiv:2411.12319, 2024 - arxiv.org
Face recognition is a core task in computer vision designed to identify and authenticate
individuals by analyzing facial patterns and features. This field intersects with artificial …