A benchmark dataset and comparison study for multi-modal human action analytics

J Liu, S Song, C Liu, Y Li, Y Hu - ACM Transactions on Multimedia …, 2020 - dl.acm.org
Large-scale benchmarks provide a solid foundation for the development of action analytics.
Most of the previous activity benchmarks focus on analyzing actions in RGB videos. There is …

Online early-late fusion based on adaptive hmm for sign language recognition

D Guo, W Zhou, H Li, M Wang - ACM Transactions on Multimedia …, 2017 - dl.acm.org
In sign language recognition (SLR) with multimodal data, a sign word can be represented by
multiply features, for which there exist an intrinsic property and a mutually complementary …

Multimodal fusion based on LSTM and a couple conditional hidden Markov model for Chinese sign language recognition

Q Xiao, M Qin, P Guo, Y Zhao - IEEE Access, 2019 - ieeexplore.ieee.org
A novel multimodal fusion approach is proposed for Chinese sign language (CSL)
recognition. This framework, the LSTM2+ CHMM model, uses dual long short-term memory …

Egocentric early action prediction via adversarial knowledge distillation

N Zheng, X Song, T Su, W Liu, Y Yan… - ACM Transactions on …, 2023 - dl.acm.org
Egocentric early action prediction aims to recognize actions from the first-person view by
only observing a partial video segment, which is challenging due to the limited context …

Knowledge-driven egocentric multimodal activity recognition

Y Huang, X Yang, J Gao, J Sang, C Xu - ACM Transactions on …, 2020 - dl.acm.org
Recognizing activities from egocentric multimodal data collected by wearable cameras and
sensors, is gaining interest, as multimodal methods always benefit from the complementarity …

A differentiable parallel sampler for efficient video classification

X Wang, L Zhu, F Wu, Y Yang - ACM Transactions on Multimedia …, 2023 - dl.acm.org
It is crucial to sample a small portion of relevant frames for efficient video classification. The
existing methods mainly develop hand-designed sampling strategies or learn sequential …

Joint transferable dictionary learning and view adaptation for multi-view human action recognition

B Sun, D Kong, S Wang, L Wang, B Yin - ACM Transactions on …, 2021 - dl.acm.org
Multi-view human action recognition remains a challenging problem due to large view
changes. In this article, we propose a transfer learning-based framework called transferable …

Multi-View Representation Learning via View-Aware Modulation

R Wang, H Sun, X Nie, Y Lin, X Xi, Y Yin - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Multi-view (representation) learning derives an entity's representation from its multiple
observable views to facilitate various downstream tasks. The most challenging topic is how …

Multimodal Score Fusion with Sparse Low-rank Bilinear Pooling for Egocentric Hand Action Recognition

K Roy - ACM Transactions on Multimedia Computing …, 2024 - dl.acm.org
With the advent of egocentric cameras, there are new challenges where traditional computer
vision is not sufficient to handle this kind of video. Moreover, egocentric cameras often offer …

Learning to adaptively scale recurrent neural networks

H Hu, L Wang, GJ Qi - Proceedings of the AAAI Conference on Artificial …, 2019 - aaai.org
Recent advancements in recurrent neural network (RNN) research have demonstrated the
superiority of utilizing multiscale structures in learning temporal representations of time …