Action2vec: A crossmodal embedding approach to action learning

M Wray, D Larlus, G Csurka… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

We address the problem of cross-modal fine-grained action retrieval between text and video.
Cross-modal retrieval is commonly achieved through learning a shared embedding space …

被引用次数：165 相关文章所有 10 个版本

[PDF] thecvf.com

Rethinking zero-shot video classification: End-to-end training for realistic applications

B Brattoli, J Tighe, F Zhdanov… - Proceedings of the …, 2020 - openaccess.thecvf.com

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds
of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) …

被引用次数：149 相关文章所有 13 个版本

[PDF] thecvf.com

Audio-visual generalised zero-shot learning with cross-modal attention and language

OB Mercea, L Riesch, A Koepke… - Proceedings of the …, 2022 - openaccess.thecvf.com

Learning to classify video data from classes not included in the training data, ie video-based
zero-shot learning, is challenging. We conjecture that the natural alignment between the …

被引用次数：46 相关文章所有 8 个版本

[PDF] thecvf.com

Cross-modal representation learning for zero-shot action recognition

CC Lin, K Lin, L Wang, Z Liu… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We present a cross-modal Transformer-based framework, which jointly encodes video data
and text labels for zero-shot action recognition (ZSAR). Our model employs a conceptually …

被引用次数：42 相关文章所有 5 个版本

Multi-modal zero-shot dynamic hand gesture recognition

R Rastgoo, K Kiani, S Escalera, M Sabokrou - Expert Systems with …, 2024 - Elsevier

Abstract Zero-Shot Learning (ZSL) has rapidly advanced in recent years. Towards
overcoming the annotation bottleneck in the Dynamic Hand Gesture Recognition (DHGR) …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Towards zero-shot sign language recognition

YC Bilge, RG Cinbis… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

This paper tackles the problem of zero-shot sign language recognition (ZSSLR), where the
goal is to leverage models learned over the seen sign classes to recognize the instances of …

被引用次数：35 相关文章所有 10 个版本

[PDF] arxiv.org

Multimodal open-vocabulary video classification via pre-trained vision and language models

R Qian, Y Li, Z Xu, MH Yang, S Belongie… - arXiv preprint arXiv …, 2022 - arxiv.org

Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is
becoming a promising paradigm for open-vocabulary visual recognition. In this work, we …

被引用次数：30 相关文章所有 2 个版本

[PDF] arxiv.org

Quo vadis, skeleton action recognition?

P Gupta, A Thatipelli, A Aggarwal… - International Journal of …, 2021 - Springer

In this paper, we study current and upcoming frontiers across the landscape of skeleton-
based human action recognition. To study skeleton-action recognition in the wild, we …

被引用次数：59 相关文章所有 12 个版本

[PDF] thecvf.com

Zero-shot action recognition with transformer-based video semantic embedding

K Doshi, Y Yilmaz - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

While video action recognition has been an active area of research for several years, zero-
shot action recognition has only recently started gaining traction. In this work, we propose a …

被引用次数：9 相关文章所有 5 个版本

[PDF] thecvf.com

Alignment-uniformity aware representation learning for zero-shot video classification

S Pu, K Zhao, M Zheng - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

Most methods tackle zero-shot video classification by aligning visual-semantic
representations within seen classes, which limits generalization to unseen classes. To …

被引用次数：19 相关文章所有 7 个版本

高级搜索

QQ 群