Vision transformers for action recognition: A survey

A Ulhaq, N Akhtar, G Pogrebna, A Mian - arXiv preprint arXiv:2209.05700, 2022 - arxiv.org
Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have also proven the efficacy of transformers beyond the image domain …

Aspnet: Action segmentation with shared-private representation of multiple data sources

B van Amsterdam… - Proceedings of the …, 2023 - openaccess.thecvf.com
Most state-of-the-art methods for action segmentation are based on single input modalities
or naive fusion of multiple data sources. However, effective fusion of complementary …

EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting

Z Wang, Q Miao, Y Xi, P Zhao - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The portrait matting task aims to extract an alpha matte with complete semantics and finely
detailed contours. In comparison to CNN-based approaches transformers with self-attention …

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

A Xu, WS Zheng - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Weakly-supervised action segmentation is a task of learning to partition a long video into
several action segments where training videos are only accompanied by transcripts …

Spatial-temporal graph transformer network for skeleton-based temporal action segmentation

X Tian, Y Jin, Z Zhang, P Liu, X Tang - Multimedia Tools and Applications, 2024 - Springer
Temporal action segmentation (TAS) of minute-long untrimmed videos involves locating and
classifying human action segments using multiple action class labels. Previously, research …

U-Transformer-based multi-levels refinement for weakly supervised action segmentation

X Ke, X Miao, W Guo - Pattern Recognition, 2024 - Elsevier
Action segmentation is a research hotspot in human action analysis, which aims to split
videos into segments of different actions. Recent algorithms have achieved great success in …

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

UO Sarawgi, J Berkowitz, V Garg… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Streaming neural network models for fast frame-wise responses to various speech and
sensory signals are widely adopted on resource-constrained platforms. Hence, increasing …

A circular window-based cascade transformer for online action detection

S Cao, W Luo, B Wang, W Zhang, L Ma - arXiv preprint arXiv:2208.14209, 2022 - arxiv.org
Online action detection aims at the accurate action prediction of the current frame based on
long historical observations. Meanwhile, it demands real-time inference on online streaming …

Pose-aware video action segmentation

M Zhang, C Liao, Q Li, H Zhang, W Liu - Neural Computing and …, 2024 - Springer
Action segmentation is an emerging task in video understanding, particularly for untrimmed
videos containing multiple actions. However, existing video-based methods may struggle …

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

J Berkowitz, V Garg, A Kundu, M Cho, SS Buddi… - arXiv preprint arXiv …, 2023 - arxiv.org
Streaming neural network models for fast frame-wise responses to various speech and
sensory signals are widely adopted on resource-constrained platforms. Hence, increasing …