A large-scale study of spatiotemporal representation learning with a new benchmark on action...

S Yu, J Cho, P Yadav, M Bansal - Advances in Neural …, 2024 - proceedings.neurips.cc

Recent studies have shown promising results on utilizing large pre-trained image-language
models for video question answering. While these image-language models can efficiently …

被引用次数：75 相关文章所有 7 个版本

[PDF] thecvf.com

Learning to predict activity progress by self-supervised video alignment

G Donahue, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

In this paper we tackle the problem of self-supervised video alignment and activity progress
prediction using in-the-wild videos. Our proposed self-supervised representation learning …

被引用次数：4 相关文章所有 2 个版本

[PDF] springer.com

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer

What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

被引用次数：17 相关文章所有 7 个版本

[PDF] hal.science

Online human motion analysis in industrial context: A review

T Benmessabih, R Slama, V Havard… - Engineering Applications of …, 2024 - Elsevier

Human motion analysis plays a crucial role in industry 4.0 and, more recently, in industry 5.0
where human-centered applications are becoming increasingly important, demonstrating its …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Videoeval: Comprehensive benchmark suite for low-cost evaluation of video foundation model

X Li, Z Huang, J Wang, K Li, L Wang - arXiv preprint arXiv:2407.06491, 2024 - arxiv.org

With the growth of high-quality data and advancement in visual pre-training paradigms,
Video Foundation Models (VFMs) have made significant progress recently, demonstrating …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning

J Wu, S Mo, S Atito, J Kittler, Z Feng… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, self-supervised metric learning has raised attention for the potential to learn a
generic distance function. It overcomes the limitations of conventional supervised one, eg …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

TransferAttn: Transferable-guided Attention Is All You Need for Video Domain Adaptation

A Sacilotti, SF Santos, N Sebe, J Almeida - arXiv preprint arXiv …, 2024 - arxiv.org

Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well
explored compared to image-based UDA techniques. Although vision transformers (ViT) …

Text-Enhanced Zero-Shot Action Recognition: A training-free approach

M Bosetti, S Zhang, B Liberatori, G Zara, E Ricci… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-language models (VLMs) have demonstrated remarkable performance across
various visual tasks, leveraging joint learning of visual and textual representations. While …

[引用][C] React to this! How humans challenge interactive agents using nonverbal behaviors

C Zhang - 2024 - Simon Fraser University

高级搜索

QQ 群