作者
Prithwish Jana, Swarnabja Bhaumik, Partha Pratim Mohanta
发表日期
2022
期刊
International Journal of Computer Information Systems and Industrial Management Applications
卷号
14
页码范围
270-284
出版商
Machine Intelligence Research Labs
简介
Automated event and activity recognition in unconstrained videos has become a societal necessity. In this paper, we address video event classification and analyze the influence of preprocessing through action localization on the classification task. We propose an approach for event classification in videos, that is aided by unsupervised preprocessing through temporal attention and subsequent spatial action-localization at those specific attentive instants of time. The unsupervised temporal attention is achieved through a graph-based algorithm for selection of representative (key) frames. Our spatial action localization technique SALiEnSeA identifies the most-‘dynamic’motion patch in each key-frame. It is based on an oil-painting approach of refining and stacking motion components. These focused actions along with spatial and temporal information are fed into three separate deep neural-network pipelines consisting of ResNet50 and LSTM. A multi-tier hierarchical fusion thereby, consolidates frame-level and video-level predictions. The experiment is performed on four benchmark datasets: CCV, KCV, UCF-101 and HMDB-51. The holistically developed solution framework for action localization-aided event classification provides encouraging results. By introducing a separate modality for action-localized SALiEnSeA patches, we get improved video classification performance on top of the traditional modality of RGB frames. This outperforms standard neural-network based approaches as well as state-of-the-art multimodal models in use, for video classification.
学术搜索中的文章
P Jana, S Bhaumik, PP Mohanta - International Journal of Computer Information Systems …, 2022