Convolutional tensor-train LSTM for spatio-temporal learning

J Su, W Byeon, J Kossaifi, F Huang… - Advances in …, 2020 - proceedings.neurips.cc
Learning from spatio-temporal data has numerous applications such as human-behavior
analysis, object tracking, video compression, and physics simulation. However, existing …

Shifted chunk transformer for spatio-temporal representational learning

X Zha, W Zhu, L Xun, S Yang… - Advances in Neural …, 2021 - proceedings.neurips.cc
Spatio-temporal representational learning has been widely adopted in various fields such as
action recognition, video object segmentation, and action anticipation. Previous spatio …

Swinlstm: Improving spatiotemporal prediction accuracy using swin transformer and lstm

S Tang, C Li, P Zhang, RN Tang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Integrating CNNs and RNNs to capture spatiotemporal dependencies is a prevalent
strategy for spatiotemporal prediction tasks. However, the property of CNNs to learn local …

Adamae: Adaptive masking for efficient spatiotemporal learning with masked autoencoders

WGC Bandara, N Patel, A Gholami… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Masked Autoencoders (MAEs) learn generalizable representations for image, text,
audio, video, etc., by reconstructing masked input data from tokens of the visible data …

Convolutional state space models for long-range spatiotemporal modeling

J Smith, S De Mello, J Kautz… - Advances in Neural …, 2024 - proceedings.neurips.cc
Effectively modeling long spatiotemporal sequences is challenging due to the need to model
complex spatial correlations and long-range temporal dependencies simultaneously …

Simvp: Towards simple yet powerful spatiotemporal predictive learning

C Tan, Z Gao, S Li, SZ Li - arXiv preprint arXiv:2211.12509, 2022 - arxiv.org
Recent years have witnessed remarkable advances in spatiotemporal predictive learning,
incorporating auxiliary inputs, elaborate neural architectures, and sophisticated training …

Spatiotemporal self-attention modeling with temporal patch shift for action recognition

W Xiang, C Li, B Wang, X Wei, XS Hua… - European Conference on …, 2022 - Springer
Transformer-based methods have recently achieved great advancement on 2D image-
based vision tasks. For 3D video-based tasks such as action recognition, however, directly …

[HTML][HTML] Semi-CNN architecture for effective spatio-temporal learning in action recognition

MC Leong, DK Prasad, YT Lee, F Lin - Applied Sciences, 2020 - mdpi.com
This paper introduces a fusion convolutional architecture for efficient learning of spatio-
temporal features in video action recognition. Unlike 2D convolutional neural networks …

Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning

Y Wang, Z Gao, M Long, J Wang… - … on machine learning, 2018 - proceedings.mlr.press
We present PredRNN++, a recurrent network for spatiotemporal predictive learning. In
pursuit of a great modeling capability for short-term video dynamics, we make our network …

Self-supervised spatiotemporal feature learning via video rotation prediction

L Jing, X Yang, J Liu, Y Tian - arXiv preprint arXiv:1811.11387, 2018 - arxiv.org
The success of deep neural networks generally requires a vast amount of training data to be
labeled, which is expensive and unfeasible in scale, especially for video collections. To …