An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

KY Lin, H Ding, J Zhou, YX Peng, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Contrastive Language-Image Pretraining (CLIP) has shown remarkable open-vocabulary
abilities across various image understanding tasks. Building upon this impressive success …

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

SA Peirone, F Pistilli, A Alliegro… - Proceedings of the …, 2024 - openaccess.thecvf.com
Human comprehension of a video stream is naturally broad: in a few instants we are able to
understand what is happening the relevance and relationship of objects and forecast what …

A survey on deep learning techniques for action anticipation

Z Zhong, M Martin, M Voit, J Gall, J Beyerer - arXiv preprint arXiv …, 2023 - arxiv.org
The ability to anticipate possible future human actions is essential for a wide range of
applications, including autonomous driving and human-robot interaction. Consequently …

What does CLIP know about peeling a banana?

C Cuttano, G Rosi, G Trivigno… - Proceedings of the …, 2024 - openaccess.thecvf.com
Humans show an innate capability to identify tools to support specific actions. The
association between objects parts and the actions they facilitate is usually named …

Human-Centric Transformer for Domain Adaptive Action Recognition

KY Lin, J Zhou, WS Zheng - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
We study the domain adaptation task for action recognition, namely domain adaptive action
recognition, which aims to effectively transfer action recognition power from a label-sufficient …

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? An Investigation and the HOI-Synth Domain Adaptation Benchmark

R Leonardi, A Furnari, F Ragusa… - arXiv preprint arXiv …, 2023 - arxiv.org
In this study, we investigate the effectiveness of synthetic data in enhancing hand-object
interaction detection within the egocentric vision domain. We introduce a simulator able to …

AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation

L Mur-Labadia, R Martinez-Cantin, J Guerrero… - arXiv preprint arXiv …, 2024 - arxiv.org
Short-Term object-interaction Anticipation consists of detecting the location of the next-active
objects, the noun and verb categories of the interaction, and the time to contact from the …

EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

B Xu, Z Wang, Y Du, S Zheng, Z Song, Q Jin - arXiv preprint arXiv …, 2024 - arxiv.org
Egocentric video-language pretraining is a crucial paradigm to advance the learning of
egocentric hand-object interactions (EgoHOI). Despite the great success on existing …

Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition

M Hatano, R Hachiuma, R Fuji, H Saito - arXiv preprint arXiv:2405.19917, 2024 - arxiv.org
We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input
and unlabeled target data for egocentric action recognition. This paper simultaneously …