查看文章

thecvf.com 中的 [PDF]

Spatio-temporal vector of locally max pooled features for action recognition in videos

作者

Ionut Cosmin Duta, Bogdan Ionescu, Kiyoharu Aizawa, Nicu Sebe

发表日期

2017

研讨会论文

Proceedings of the IEEE conference on Computer Vision and Pattern Recognition

页码范围

3097-3106

简介

We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video. Feature assignment is carried out at two levels, by using the similarity and spatio-temporal information. For each assignment we build a specific encoding, focused on the nature of deep features, with the goal to capture the highest feature responses from the highest neuron activation of the network. Our ST-VLMPF clearly provides a more reliable video representation than some of the most widely used and powerful encoding approaches (Improved Fisher Vectors and Vector of Locally Aggregated Descriptors), while maintaining a low computational complexity. We conduct experiments on three action recognition datasets: HMDB51, UCF50 and UCF101. Our pipeline obtains state-of-the-art results.

引用总数

被引用次数：81

201720182019202020212022202320242 26 21 10 8 3 10 1

学术搜索中的文章

Spatio-temporal vector of locally max pooled features for action recognition in videos

I Cosmin Duta, B Ionescu, K Aizawa, N Sebe - Proceedings of the IEEE conference on Computer …, 2017

被引用次数：81 相关文章所有 10 个版本