Shuffle and learn: unsupervised learning using temporal order verification

I Misra, CL Zitnick, M Hebert - … , The Netherlands, October 11–14, 2016 …, 2016 - Springer
In this paper, we present an approach for learning a visual representation from the raw
spatiotemporal signals in videos. Our representation is learned without supervision from …

Video representation learning by recognizing temporal transformations

S Jenni, G Meishvili, P Favaro - European conference on computer vision, 2020 - Springer
We introduce a novel self-supervised learning approach to learn representations of videos
that are responsive to changes in the motion dynamics. Our representations can be learned …

Unsupervised representation learning by sorting sequences

HY Lee, JB Huang, M Singh… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
We present an unsupervised representation learning approach using videos without
semantic labels. We leverage the temporal coherence as a supervisory signal by formulating …

Video representation learning with visual tempo consistency

C Yang, Y Xu, B Dai, B Zhou - arXiv preprint arXiv:2006.15489, 2020 - arxiv.org
Visual tempo, which describes how fast an action goes, has shown its potential in
supervised action recognition. In this work, we demonstrate that visual tempo can also serve …

Learning end-to-end video classification with rank-pooling

B Fernando, S Gould - International Conference on Machine …, 2016 - proceedings.mlr.press
We introduce a new model for representation learning and classification of video
sequences. Our model is based on a convolutional neural network coupled with a novel …

Self-supervised spatiotemporal learning via video clip order prediction

D Xu, J Xiao, Z Zhao, J Shao, D Xie… - Proceedings of the …, 2019 - openaccess.thecvf.com
We propose a self-supervised spatiotemporal learning technique which leverages the
chronological order of videos. Our method can learn the spatiotemporal representation of …

Unsupervised learning of video representations using lstms

N Srivastava, E Mansimov… - … on machine learning, 2015 - proceedings.mlr.press
Abstract We use Long Short Term Memory (LSTM) networks to learn representations of
video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed …

Only time can tell: Discovering temporal data for temporal modeling

L Sevilla-Lara, S Zha, Z Yan… - Proceedings of the …, 2021 - openaccess.thecvf.com
Understanding temporal information and how the visual world changes over time, is a
fundamental ability of intelligent systems. In video understanding, temporal information is at …

Temporal contrastive pretraining for video action recognition

G Lorre, J Rabarisoa, A Orcesi… - Proceedings of the …, 2020 - openaccess.thecvf.com
In this paper, we propose a self-supervised method for video representation learning based
on Contrastive Predictive Coding (CPC)[27]. Previously, CPC has been used to learn …

Deep temporal linear encoding networks

A Diba, V Sharma, L Van Gool - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
The CNN-encoding of features from entire videos for the representation of human actions
has rarely been addressed. Instead, CNN work has focused on approaches to fuse spatial …