Cdul: Clip-driven unsupervised learning for multi-label image classification

Boosting transferability in vision-language attacks via diversification along the intersection region of adversarial trajectory

S Gao, X Jia, X Ren, I Tsang, Q Guo - European Conference on Computer …, 2025 - Springer

Vision-language pre-training (VLP) models exhibit remarkable capabilities in
comprehending both images and text, yet they remain susceptible to multimodal adversarial …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Pedestrian attribute recognition via clip based prompt vision-language fusion

X Wang, J Jin, C Li, J Tang, C Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Existing pedestrian attribute recognition (PAR) algorithms adopt pre-trained CNN (eg,
ResNet) as their backbone network for visual feature learning, which might obtain sub …

被引用次数：8 相关文章所有 2 个版本

Triplane-Smoothed Video Dehazing with CLIP-Enhanced Generalization

J Ren, H Chen, T Ye, H Wu, L Zhu - International Journal of Computer …, 2024 - Springer

Video dehazing is a critical research area in computer vision that aims to enhance the
quality of hazy frames, which benefits many downstream tasks, eg semantic segmentation …

被引用次数：2 相关文章

[PDF] arxiv.org

Evaluating fairness in large vision-language models across diverse demographic attributes and prompts

X Wu, Y Wang, HT Wu, Z Tao, Y Fang - arXiv preprint arXiv:2406.17974, 2024 - arxiv.org

Large vision-language models (LVLMs) have recently achieved significant progress,
demonstrating strong capabilities in open-world visual understanding. However, it is not yet …

被引用次数：3 相关文章所有 2 个版本

[PDF] aaai.org

CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model

P Yin, G Zeng, J Wang, D Xie - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Gaze estimation methods often experience significant performance degradation when
evaluated across different domains, due to the domain gap between the testing and training …

被引用次数：4 相关文章所有 3 个版本

[PDF] thecvf.com

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

J Li, B Li, Z Tu, X Liu, Q Guo… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-centric perception systems for autonomous driving have gained considerable
attention recently due to their cost-effectiveness and scalability especially compared to …

被引用次数：10 相关文章所有 3 个版本

[PDF] ieee.org

Plgan: Generative adversarial networks for power-line segmentation in aerial images

R Abdelfattah, X Wang, S Wang - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org

Accurate segmentation of power lines in various aerial images is very important for UAV
flight safety. The complex background and very thin structures of power lines, however …

被引用次数：18 相关文章所有 9 个版本

[PDF] thecvf.com

Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships

R Daroya, A Sun, S Maji - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Modeling and visualizing relationships between tasks or datasets is an important step
towards solving various meta-tasks such as dataset discovery multi-tasking and transfer …

Adaptive mixed-scale feature fusion network for blind AI-generated image quality assessment

T Zhou, S Tan, W Zhou, Y Luo… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

With the increasing maturity of the text-to-image and image-to-image generative models, AI-
generated images (AGIs) have shown great application potential in advertisement …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Multi-label cluster discrimination for visual representation learning

X An, K Yang, X Dai, Z Feng, J Deng - European Conference on Computer …, 2025 - Springer

Abstract Contrastive Language Image Pre-training (CLIP) has recently demonstrated
success across various tasks due to superior feature representation empowered by image …

被引用次数：1 相关文章

高级搜索

QQ 群