Sportu: A comprehensive sports understanding benchmark for multimodal large language models

H Xia, Z Yang, J Zou, R Tracy, Y Wang, C Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) are advancing the ability to reason about
complex sports scenarios by integrating textual and visual information. To comprehensively …

A Comprehensive Survey of Action Quality Assessment: Method and Benchmark

K Zhou, R Cai, L Wang, HPH Shum, X Liang - arXiv preprint arXiv …, 2024 - arxiv.org
Action Quality Assessment (AQA) quantitatively evaluates the quality of human actions,
providing automated assessments that reduce biases in human judgment. Its applications …

Pro2Diff: Proposal Propagation for Multi-Object Tracking via the Diffusion Model

H Liu, C Zhang, B Fan, J Xu - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
Multi-object tracking (MOT) aims to estimate the bounding boxes and ID labels of objects in
videos. The challenging issue in this task is to alleviate competitive learning between the …

ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition

M Salehi, JS Park, T Yadav, A Kusupati… - arXiv preprint arXiv …, 2024 - arxiv.org
Our world is full of varied actions and moves across specialized domains that we, as
humans, strive to identify and understand. Within any single domain, actions can often …

Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition

Y Chen, J Guo, S Guo, D Tao - arXiv preprint arXiv:2411.11288, 2024 - arxiv.org
Zero-shot skeleton action recognition is a non-trivial task that requires robust unseen
generalization with prior knowledge from only seen classes and shared semantics. Existing …

ShadowPunch: fast actions spotting benchmark

A Simonyan, N Falaleev - openreview.net
We introduce an open dataset for video event spotting focused on fast-paced events in
shadowboxing videos captured at high frame rates. The dataset features accurate frame …