Aggregating frame-level features for large-scale video classification

S Chen, X Wang, Y Tang, X Chen, Z Wu… - arXiv preprint arXiv …, 2017 - arxiv.org
arXiv preprint arXiv:1707.00803, 2017arxiv.org
This paper introduces the system we developed for the Google Cloud & YouTube-8M Video
Understanding Challenge, which can be considered as a multi-label classification problem
defined on top of the large scale YouTube-8M Dataset. We employ a large set of techniques
to aggregate the provided frame-level feature representations and generate video-level
predictions, including several variants of recurrent neural networks (RNN) and generalized
VLAD. We also adopt several fusion strategies to explore the complementarity among the …
This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset. We employ a large set of techniques to aggregate the provided frame-level feature representations and generate video-level predictions, including several variants of recurrent neural networks (RNN) and generalized VLAD. We also adopt several fusion strategies to explore the complementarity among the models. In terms of the official metric GAP@20 (global average precision at 20), our best fusion model attains 0.84198 on the public 50\% of test data and 0.84193 on the private 50\% of test data, ranking 4th out of 650 teams worldwide in the competition.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果