作者
Yaqing Hou, Hua Yu, Dongsheng Zhou, Pengfei Wang, Hongwei Ge, Jianxin Zhang, Qiang Zhang
发表日期
2021/12
期刊
Neural Computing and Applications
卷号
33
页码范围
16439-16450
出版商
Springer London
简介
In the study of human action recognition, two-stream networks have made excellent progress recently. However, there remain challenges in distinguishing similar human actions in videos. This paper proposes a novel local-aware spatio-temporal attention network with multi-stage feature fusion based on compact bilinear pooling for human action recognition. To elaborate, taking two-stream networks as our essential backbones, the spatial network first employs multiple spatial transformer networks in a parallel manner to locate the discriminative regions related to human actions. Then, we perform feature fusion between the local and global features to enhance the human action representation. Furthermore, the output of the spatial network and the temporal information are fused at a particular layer to learn the pixel-wise correspondences. After that, we bring together three outputs to generate the global …
引用总数
学术搜索中的文章