Boosting transferability in vision-language attacks via diversification along the intersection region of adversarial trajectory

S Gao, X Jia, X Ren, I Tsang, Q Guo - European Conference on Computer …, 2025 - Springer
Vision-language pre-training (VLP) models exhibit remarkable capabilities in
comprehending both images and text, yet they remain susceptible to multimodal adversarial …

Pedestrian attribute recognition via clip based prompt vision-language fusion

X Wang, J Jin, C Li, J Tang, C Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Existing pedestrian attribute recognition (PAR) algorithms adopt pre-trained CNN (eg,
ResNet) as their backbone network for visual feature learning, which might obtain sub …

Triplane-Smoothed Video Dehazing with CLIP-Enhanced Generalization

J Ren, H Chen, T Ye, H Wu, L Zhu - International Journal of Computer …, 2024 - Springer
Video dehazing is a critical research area in computer vision that aims to enhance the
quality of hazy frames, which benefits many downstream tasks, eg semantic segmentation …

Evaluating fairness in large vision-language models across diverse demographic attributes and prompts

X Wu, Y Wang, HT Wu, Z Tao, Y Fang - arXiv preprint arXiv:2406.17974, 2024 - arxiv.org
Large vision-language models (LVLMs) have recently achieved significant progress,
demonstrating strong capabilities in open-world visual understanding. However, it is not yet …

CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model

P Yin, G Zeng, J Wang, D Xie - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Gaze estimation methods often experience significant performance degradation when
evaluated across different domains, due to the domain gap between the testing and training …

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

J Li, B Li, Z Tu, X Liu, Q Guo… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-centric perception systems for autonomous driving have gained considerable
attention recently due to their cost-effectiveness and scalability especially compared to …

Plgan: Generative adversarial networks for power-line segmentation in aerial images

R Abdelfattah, X Wang, S Wang - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
Accurate segmentation of power lines in various aerial images is very important for UAV
flight safety. The complex background and very thin structures of power lines, however …

Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships

R Daroya, A Sun, S Maji - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Modeling and visualizing relationships between tasks or datasets is an important step
towards solving various meta-tasks such as dataset discovery multi-tasking and transfer …

Adaptive mixed-scale feature fusion network for blind AI-generated image quality assessment

T Zhou, S Tan, W Zhou, Y Luo… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
With the increasing maturity of the text-to-image and image-to-image generative models, AI-
generated images (AGIs) have shown great application potential in advertisement …

Multi-label cluster discrimination for visual representation learning

X An, K Yang, X Dai, Z Feng, J Deng - European Conference on Computer …, 2025 - Springer
Abstract Contrastive Language Image Pre-training (CLIP) has recently demonstrated
success across various tasks due to superior feature representation empowered by image …