Helping hands: An object-aware ego-centric video recognition model

C Zhang, A Gupta, A Zisserman - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We introduce an object-aware decoder for improving the performance of spatio-temporal
representations on ego-centric videos. The key idea is to enhance object-awareness during …

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Object-centric video representation for long-term action anticipation

C Zhang, C Fu, S Wang, N Agarwal… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper focuses on building object-centric representations for long-term action
anticipation in videos. Our key motivation is that objects provide important cues to recognize …

Appearance-Agnostic Representation Learning for Compositional Action Recognition

P Huang, X Shu, R Yan, Z Tu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The discussion of compositional generalization in action recognition, ie., Compositional
Action Recognition (CAR), has recently received increasing attention. CAR challenges …

Bi-Causal: Group Activity Recognition via Bidirectional Causality

Y Zhang, W Liu, D Xu, Z Zhou… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Current approaches in Group Activity Recognition (GAR) predominantly emphasize
Human Relations (HRs) while often neglecting the impact of Human-Object Interactions …

Extending Video Masked Autoencoders to 128 frames

NB Gundavarapu, L Friedman, R Goyal… - arXiv preprint arXiv …, 2024 - arxiv.org
Video understanding has witnessed significant progress with recent video foundation
models demonstrating strong performance owing to self-supervised pre-training objectives; …

Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition

Y Li, G Chen, B Abramowitz, S Anzellott… - arXiv preprint arXiv …, 2024 - arxiv.org
Few-shot action recognition aims at quickly adapting a pre-trained model to the novel data
with a distribution shift using only a limited number of samples. Key challenges include how …

Semantic-Aware Late-Stage Supervised Contrastive Learning for Fine-Grained Action Recognition

Y Pan, Q Zhao, Y Zhang, Z Wang… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Fine-grained action recognition typically faces challenges with lower inter-class variances
and higher intra-class variances. Supervised contrastive learning is inherently suitable for …

Diving Deep into Regions: Exploiting Regional Information Transformer for Single Image Deraining

B Li, Z Zhang, H Zheng, X Xu, Y Wei, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Transformer-based Single Image Deraining (SID) methods have achieved remarkable
success, primarily attributed to their robust capability in capturing long-range interactions …

Principles of Visual Tokens for Efficient Video Understanding

X Hao, G Li, SN Gowda, RB Fisher, J Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
Video understanding has made huge strides in recent years, relying largely on the power of
the transformer architecture. As this architecture is notoriously expensive and video is highly …