In sign language recognition (SLR) with multimodal data, a sign word can be represented by multiply features, for which there exist an intrinsic property and a mutually complementary …
Q Xiao, M Qin, P Guo, Y Zhao - IEEE Access, 2019 - ieeexplore.ieee.org
A novel multimodal fusion approach is proposed for Chinese sign language (CSL) recognition. This framework, the LSTM2+ CHMM model, uses dual long short-term memory …
Egocentric early action prediction aims to recognize actions from the first-person view by only observing a partial video segment, which is challenging due to the limited context …
Y Huang, X Yang, J Gao, J Sang, C Xu - ACM Transactions on …, 2020 - dl.acm.org
Recognizing activities from egocentric multimodal data collected by wearable cameras and sensors, is gaining interest, as multimodal methods always benefit from the complementarity …
It is crucial to sample a small portion of relevant frames for efficient video classification. The existing methods mainly develop hand-designed sampling strategies or learn sequential …
B Sun, D Kong, S Wang, L Wang, B Yin - ACM Transactions on …, 2021 - dl.acm.org
Multi-view human action recognition remains a challenging problem due to large view changes. In this article, we propose a transfer learning-based framework called transferable …
R Wang, H Sun, X Nie, Y Lin, X Xi, Y Yin - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Multi-view (representation) learning derives an entity's representation from its multiple observable views to facilitate various downstream tasks. The most challenging topic is how …
K Roy - ACM Transactions on Multimedia Computing …, 2024 - dl.acm.org
With the advent of egocentric cameras, there are new challenges where traditional computer vision is not sufficient to handle this kind of video. Moreover, egocentric cameras often offer …
H Hu, L Wang, GJ Qi - Proceedings of the AAAI Conference on Artificial …, 2019 - aaai.org
Recent advancements in recurrent neural network (RNN) research have demonstrated the superiority of utilizing multiscale structures in learning temporal representations of time …