A joint local spatial and global temporal CNN-Transformer for dynamic facial expression recognition

L Wang, X Kang, F Ding, S Nakagawa, F Ren - Applied Soft Computing, 2024 - Elsevier
Unlike conventional video action recognition, Dynamic Facial Expression Recognition
(DFER) tasks exhibit minimal spatial movement of objects. Addressing this distinctive …

MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition

L Wang, X Kang, F Ding… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Unlike typical video action recognition, Dynamic Facial Expression Recognition (DFER)
does not involve distinct moving targets but relies on localized changes in facial muscles …