Mart: Masked affective representation learning via masked temporal distribution distillation

Z Zhang, P Zhao, E Park… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Limited training data is a long-standing problem for video emotion analysis (VEA). Existing
works leverage the power of large-scale image datasets for transferring while failing to …

Adapt or perish: Adaptive sparse transformer with attentive feature refinement for image restoration

S Zhou, D Chen, J Pan, J Shi… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Transformer-based approaches have achieved promising performance in image restoration
tasks given their ability to model long-range dependencies which is crucial for recovering …

Extdm: Distribution extrapolation diffusion model for video prediction

Z Zhang, J Hu, W Cheng, D Paudel… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video prediction is a challenging task due to its nature of uncertainty especially for
forecasting a long period. To model the temporal dynamics advanced methods benefit from …

Lake-red: Camouflaged images generation by latent background knowledge retrieval-augmented diffusion

P Zhao, P Xu, P Qin, DP Fan, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Camouflaged vision perception is an important vision task with numerous practical
applications. Due to the expensive collection and labeling costs this community struggles …

Ordinal label distribution learning

C Wen, X Zhang, X Yao, J Yang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Label distribution learning (LDL) is a recent hot topic, in which ambiguity is modeled via
description degrees of the labels. However, in common LDL tasks, eg, age estimation, labels …

Joint learning of video scene detection and annotation via multi-modal adaptive context network

Y Xu, L Pan, W Sang, HL Luo, L Li, P Wei… - Expert Systems with …, 2024 - Elsevier
The tasks of scene detection and annotation have gained impressive attention for
understanding video content. The main challenges lie in mitigating the error propagation of …

Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

M Wu, C Zhao, A Su, D Di, T Fu, D An, M He… - arXiv preprint arXiv …, 2024 - arxiv.org
Understanding of video creativity and content often varies among individuals, with
differences in focal points and cognitive levels across different ages, experiences, and …

Going Beyond Closed Sets: A Multimodal Perspective for Video Emotion Analysis

H Pu, Y Sun, R Song, X Chen, H Jiang, Y Liu… - Chinese Conference on …, 2023 - Springer
Emotion analysis plays a crucial role in understanding video content. Existing studies often
approach it as a closed set classification task, which overlooks the important fact that the …

eMotions: A Large-Scale Dataset for Emotion Recognition in Short Videos

X Wu, H Sun, J Xue, R Zhai, X Kong, J Nie… - arXiv preprint arXiv …, 2023 - arxiv.org
Nowadays, short videos (SVs) are essential to information acquisition and sharing in our life.
The prevailing use of SVs to spread emotions leads to the necessity of emotion recognition …

Facial Affective Behavior Analysis with Instruction Tuning

Y Li, A Dao, W Bao, Z Tan, T Chen, H Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Facial affective behavior analysis (FABA) is crucial for understanding human mental states
from images. However, traditional approaches primarily deploy models to discriminate …