Frame-level label refinement for skeleton-based weakly-supervised action recognition

M Tanaka, K Fujiwara - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

This research tackles the problem of generating interaction between two human actors
corresponding to textual description. We claim that certain interactions, which we call …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

X Wang, Z Kang, Y Mu - arXiv preprint arXiv:2404.11375, 2024 - arxiv.org

Human motion understanding is a fundamental task with diverse practical applications,
facilitated by the availability of large-scale motion capture datasets. Recent studies focus on …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models

K Fujiwara, M Tanaka, Q Yu - European Conference on Computer Vision, 2024 - Springer

With the release of large-scale motion datasets with textual annotations, the task of
establishing a robust latent space for language and 3D human motion has recently …

BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin

Q Fang, C Tang, S Ma, Y Yang - arXiv preprint arXiv:2403.07354, 2024 - arxiv.org

Skeleton-based motion representations are robust for action localization and understanding
for their invariance to perspective, lighting, and occlusion, compared with images. Yet, they …

被引用次数：1 相关文章所有 2 个版本

[PDF] acm.org

Auto-summarization of Human Volumetric Videos

N Gadipudi, CO Fearghail, J Dingliana - Proceedings of the 2024 ACM …, 2024 - dl.acm.org

The rapid expansion of immersive interaction and visualization has led to the emergence of
volumetric videos. However, analyzing information within such content is still in its early …

高级搜索

QQ 群