作者
Zufan Zhang, Zongming Lv, Chenquan Gan, Qingyi Zhu
发表日期
2020/10/14
期刊
Neurocomputing
卷号
410
页码范围
304-316
出版商
Elsevier
简介
This paper aims to address the human action recognition issue by using convolutional long short-term memory networks (Conv-LSTM) and fully-connected LSTM (FC-LSTM) with different attentions. To this end, the spatial-temporal dual-attention network (STDAN), which is mainly composed of feature extraction, attention and fusion modules, is designed. Different from the features of high-level fully-connected layer mostly used in previous work, the features of convolution and fully-connected layers of convolutional neural network (CNN) are both extracted in STDAN, which can enrich the initial level of video representation. Besides, the Conv-LSTM and FC-LSTM are employed to handle the long-duration sequential features with different temporal context information. To reinforce the spatial-temporal attention ability, a temporal attention module (TAM) and a joint spatial-temporal attention module (JSTAM) are …
引用总数
2020202120222023202449273314