How can objects help action recognition?

C Zhang, A Gupta, A Zisserman - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We introduce an object-aware decoder for improving the performance of spatio-temporal
representations on ego-centric videos. The key idea is to enhance object-awareness during …

被引用次数：16 相关文章所有 9 个版本

[PDF] springer.com

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer

What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

被引用次数：36 相关文章所有 7 个版本

[PDF] thecvf.com

Object-centric video representation for long-term action anticipation

C Zhang, C Fu, S Wang, N Agarwal… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper focuses on building object-centric representations for long-term action
anticipation in videos. Our key motivation is that objects provide important cues to recognize …

被引用次数：16 相关文章所有 5 个版本

Appearance-Agnostic Representation Learning for Compositional Action Recognition

P Huang, X Shu, R Yan, Z Tu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The discussion of compositional generalization in action recognition, ie., Compositional
Action Recognition (CAR), has recently received increasing attention. CAR challenges …

被引用次数：3 相关文章

[PDF] thecvf.com

Bi-Causal: Group Activity Recognition via Bidirectional Causality

Y Zhang, W Liu, D Xu, Z Zhou… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Current approaches in Group Activity Recognition (GAR) predominantly emphasize
Human Relations (HRs) while often neglecting the impact of Human-Object Interactions …

被引用次数：2 相关文章

[PDF] arxiv.org

Extending Video Masked Autoencoders to 128 frames

NB Gundavarapu, L Friedman, R Goyal… - arXiv preprint arXiv …, 2024 - arxiv.org

Video understanding has witnessed significant progress with recent video foundation
models demonstrating strong performance owing to self-supervised pre-training objectives; …

Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition

Y Li, G Chen, B Abramowitz, S Anzellott… - arXiv preprint arXiv …, 2024 - arxiv.org

Few-shot action recognition aims at quickly adapting a pre-trained model to the novel data
with a distribution shift using only a limited number of samples. Key challenges include how …

被引用次数：1 相关文章所有 2 个版本

Semantic-Aware Late-Stage Supervised Contrastive Learning for Fine-Grained Action Recognition

Y Pan, Q Zhao, Y Zhang, Z Wang… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org

Fine-grained action recognition typically faces challenges with lower inter-class variances
and higher intra-class variances. Supervised contrastive learning is inherently suitable for …

[PDF] arxiv.org

Diving Deep into Regions: Exploiting Regional Information Transformer for Single Image Deraining

B Li, Z Zhang, H Zheng, X Xu, Y Wei, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformer-based Single Image Deraining (SID) methods have achieved remarkable
success, primarily attributed to their robust capability in capturing long-range interactions …

Principles of Visual Tokens for Efficient Video Understanding

X Hao, G Li, SN Gowda, RB Fisher, J Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

Video understanding has made huge strides in recent years, relying largely on the power of
the transformer architecture. As this architecture is notoriously expensive and video is highly …

高级搜索

QQ 群