Rethinking weakly-supervised video temporal grounding from a game perspective

X Fang, Z Xiong, W Fang, X Qu, C Chen, J Dong… - … on Computer Vision, 2025 - Springer
This paper addresses the challenging task of weakly-supervised video temporal grounding.
Existing approaches are generally based on the moment proposal selection framework that …

Not all inputs are valid: Towards open-set video moment retrieval using language

X Fang, W Fang, D Liu, X Qu, J Dong, P Zhou… - Proceedings of the …, 2024 - dl.acm.org
Video Moment Retrieval (VMR) targets to retrieve the specific moment corresponding to a
sentence query from an untrimmed video. Although recent respectable works have made …

Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention

Z Xiong, D Liu, X Fang, X Qu, J Dong… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Video sentence grounding (VSG) is the task of identifying the segment of an untrimmed
video that semantically corresponds to a given natural language query. While many existing …

Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network

X Fang, W Fang, C Wang, D Liu, K Tang… - arXiv preprint arXiv …, 2024 - arxiv.org
Given some video-query pairs with untrimmed videos and sentence queries, temporal
sentence grounding (TSG) aims to locate query-relevant segments in these videos. Although …

Repetitive Action Counting with Hybrid Temporal Relation Modeling

K Li, X Peng, D Guo, X Yang, M Wang - arXiv preprint arXiv:2412.07233, 2024 - arxiv.org
Repetitive Action Counting (RAC) aims to count the number of repetitive actions occurring in
videos. In the real world, repetitive actions have great diversity and bring numerous …

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection

H Zhao, KQ Lin, R Yan, Z Li - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org
Video moment retrieval and highlight detection have received attention in the current era of
video content proliferation, aiming to localize moments and estimate clip relevances based …

DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation

Y Yang, J Wang, T Geng, W Qiang, C Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Interior design is a complex and creative discipline involving aesthetics, functionality,
ergonomics, and materials science. Effective solutions must meet diverse requirements …

Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding

M Wang, H Li, Y Zhang, J Li, M Xie, D Tao - arXiv preprint arXiv …, 2024 - arxiv.org
Video Paragraph Grounding (VPG) aims to precisely locate the most appropriate moments
within a video that are relevant to a given textual paragraph query. However, existing …

Improving the Transferability of 3D Point Cloud Attack via Spectral-aware Admix and Optimization Designs

S Hu, D Liu, W Hu - arXiv preprint arXiv:2412.12626, 2024 - arxiv.org
Deep learning models for point clouds have shown to be vulnerable to adversarial attacks,
which have received increasing attention in various safety-critical applications such as …

Towards Robust Temporal Activity Localization Learning with Noisy Labels

D Liu, X Qu, X Fang, J Dong, P Zhou… - Proceedings of the …, 2024 - aclanthology.org
This paper addresses the task of temporal activity localization (TAL). Although recent works
have made significant progress in TAL research, almost all of them implicitly assume that the …