Distract your attention: Multi-head cross attention network for facial expression recognition

Z Wen, W Lin, T Wang, G Xu - Biomimetics, 2023 - mdpi.com
This paper presents a novel facial expression recognition network, called Distract your
Attention Network (DAN). Our method is based on two key observations in biological visual …

ESSAformer: Efficient transformer for hyperspectral image super-resolution

M Zhang, C Zhang, Q Zhang, J Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-
resolution hyperspectral image from a low-resolution observation. However, the prevailing …

Deep image matting: A comprehensive survey

J Li, J Zhang, D Tao - arXiv preprint arXiv:2304.04672, 2023 - arxiv.org
Image matting refers to extracting precise alpha matte from natural images, and it plays a
critical role in various downstream applications, such as image editing. Despite being an ill …

Dat++: Spatially dynamic vision transformer with deformable attention

Z Xia, X Pan, S Song, LE Li, G Huang - arXiv preprint arXiv:2309.01430, 2023 - arxiv.org
Transformers have shown superior performance on various vision tasks. Their large
receptive field endows Transformer models with higher representation power than their CNN …

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

Y Ma, H Liu, H Wang, H Pan, Y He, J Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which
animates a reference portrait with target landmark sequences. The main challenge of portrait …

MTP: Advancing remote sensing foundation model via multi-task pretraining

D Wang, J Zhang, M Xu, L Liu, D Wang… - IEEE Journal of …, 2024 - ieeexplore.ieee.org
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing
various image interpretation tasks. Pretraining is an active research topic, encompassing …

Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt

Z Ying, A Liu, T Zhang, Z Yu, S Liang, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the realm of large vision language models (LVLMs), jailbreak attacks serve as a red-
teaming approach to bypass guardrails and uncover safety implications. Existing jailbreaks …

AI-Enhanced Gas Flares Remote Sensing and Visual Inspection: Trends and Challenges

M Al Radi, P Li, S Boumaraf, J Dias, N Werghi… - IEEE …, 2024 - ieeexplore.ieee.org
The real-time analysis of gas flares is one of the most challenging problems in the operation
of various combustion-involving industries, such as oil and gas refineries. Despite the crucial …

SparseSwin: Swin transformer with sparse transformer block

K Pinasthika, BSP Laksono, RBP Irsal, N Yudistira - Neurocomputing, 2024 - Elsevier
Advancements in computer vision research have put transformer architecture as the state-of-
the-art in computer vision tasks. One of the known drawbacks of the transformer architecture …

[PDF][PDF] Artgpt-4: Artistic vision-language understanding with adapter-enhanced minigpt-4

Z Yuan, H Xue, X Wang, Y Liu, Z Zhao… - arXiv preprint arXiv …, 2023 - huggingface.co
In recent years, large language models (LLMs) have made significant progress in natural
language processing (NLP), with models like ChatGPT and GPT-4 achieving impressive …