作者
Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, Li Fei-Fei
发表日期
2017/1/7
期刊
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
简介
We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+ D and MSR Daily Activity 3D. Our framework is generic to any input modality, ie, RGB, depth, and RGB-D videos.
引用总数
20172018201920202021202220232024233139404230279
学术搜索中的文章
Z Luo, B Peng, DA Huang, A Alahi, L Fei-Fei - Proceedings of the IEEE conference on computer …, 2017