相关文章- 学术资源搜索

Action2vec: A crossmodal embedding approach to action learning

M Hahn, A Silva, JM Rehg - arXiv preprint arXiv:1901.00484, 2019 - arxiv.org

We describe a novel cross-modal embedding space for actions, named Action2Vec, which
combines linguistic cues from class labels with spatio-temporal features derived from video …

被引用次数：65 相关文章所有 2 个版本

[PDF] arxiv.org

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

KY Lin, H Ding, J Zhou, YX Peng, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

Contrastive Language-Image Pretraining (CLIP) has shown remarkable open-vocabulary
abilities across various image understanding tasks. Building upon this impressive success …

被引用次数：4 相关文章所有 2 个版本

[PDF] thecvf.com

Home action genome: Cooperative compositional action understanding

N Rai, H Chen, J Ji, R Desai… - Proceedings of the …, 2021 - openaccess.thecvf.com

Existing research on action recognition treats activities as monolithic events occurring in
videos. Recently, the benefits of formulating actions as a combination of atomic-actions have …

被引用次数：67 相关文章所有 5 个版本

[PDF] thecvf.com

Open set action recognition via multi-label evidential learning

C Zhao, D Du, A Hoogs, C Funk - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Existing methods for open set action recognition focus on novelty detection that assumes
video clips show a single action, which is unrealistic in the real world. We propose a new …

被引用次数：14 相关文章所有 5 个版本

[PDF] thecvf.com

Promptonomyvit: Multi-task prompt learning improves video transformers using synthetic scene data

R Herzig, O Abramovich… - Proceedings of the …, 2024 - openaccess.thecvf.com

Action recognition models have achieved impressive results by incorporating scene-level
annotations, such as objects, their relations, 3D structure, and more. However, obtaining …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Learn2augment: learning to composite videos for data augmentation in action recognition

SN Gowda, M Rohrbach, F Keller… - European conference on …, 2022 - Springer

We address the problem of data augmentation for video action recognition. Standard
augmentation strategies in video are hand-designed and sample the space of possible …

被引用次数：29 相关文章所有 5 个版本

[PDF] neurips.cc

More is less: Learning efficient video representations by big-little network and depthwise temporal aggregation

Q Fan, CFR Chen, H Kuehne… - Advances in Neural …, 2019 - proceedings.neurips.cc

Current state-of-the-art models for video action recognition are mostly based on expensive
3D ConvNets. This results in a need for large GPU clusters to train and evaluate such …

被引用次数：140 相关文章所有 9 个版本

[PDF] thecvf.com

Intra-and inter-action understanding via temporal action parsing

D Shao, Y Zhao, B Dai, D Lin - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Current methods for action recognition primarily rely on deep convolutional networks to
derive feature embeddings of visual and motion features. While these methods have …

被引用次数：68 相关文章所有 8 个版本

[PDF] thecvf.com

Large-scale weakly-supervised pre-training for video action recognition

D Ghadiyaram, D Tran… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Current fully-supervised video datasets consist of only a few hundred thousand videos and
fewer than a thousand domain-specific labels. This hinders the progress towards advanced …

被引用次数：369 相关文章所有 8 个版本

[PDF] arxiv.org

Actionclip: A new paradigm for video action recognition

M Wang, J Xing, Y Liu - arXiv preprint arXiv:2109.08472, 2021 - arxiv.org

The canonical approach to video action recognition dictates a neural model to do a classic
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …

被引用次数：320 相关文章所有 2 个版本

高级搜索

QQ 群

Action2vec: A crossmodal embedding approach to action learning

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

Home action genome: Cooperative compositional action understanding

Open set action recognition via multi-label evidential learning

Promptonomyvit: Multi-task prompt learning improves video transformers using synthetic scene data

Learn2augment: learning to composite videos for data augmentation in action recognition

More is less: Learning efficient video representations by big-little network and depthwise temporal aggregation

Intra-and inter-action understanding via temporal action parsing

Large-scale weakly-supervised pre-training for video action recognition

Actionclip: A new paradigm for video action recognition

引用