Existing approaches for spatio-temporal action detection in videos are limited by the spatial extent and temporal duration of the actions. In this paper, we present a modular system for …
M Zhou, W Chen, H Sun, W Xie, M Dong… - Knowledge-Based Systems, 2024 - Elsevier
… proposals, which weigh the original video features to generate proposal-… video and query features. Subsequently, a multi-scale proposal generation module is designed to refine video …
… : A simple pipeline1 for the conventional proposal-based WSVG methods is illustrated in Fig- … After the multi-step refinements, our IRON gradually converges to more complete intervals …
Z Lin, Z Zhao, Z Zhang, Q Wang, H Liu - Proceedings of the AAAI …, 2020 - ojs.aaai.org
… 2019), we design a novel semantic completion module that predicts the important words (eg … given by semantic completion module, we compute reward for each proposalbased on the …
D Fang, H Xu, W Wei, M Guizani, H Gao - Expert Systems with Applications, 2025 - Elsevier
… Video moment retrieval aims to precisely identify and localize a specific segment within an untrimmed video … , objects, and motions inherent in videos. Additionally, existing approaches …
… : Proposal-based method, Proposal-free method, and Reinforcement Learning-based method. In early works, some proposalbased … between video and query, treat the video as a whole …
H Xuan, Z Wu, J Yang, Y Yan… - Proceedings of the …, 2022 - openaccess.thecvf.com
… In this paper, we advocate a novel proposal-based paradigm that … As a result, our proposal-based sound source localization … Our location map tends to capture larger and more …
… offer a more complete and high-level … proposal-basedvideo saliency detection approach. The two frames in the first row are the original frames from the horses01 and people05 videos …
H Ren, W Yang, T Zhang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
… actions in untrimmed videos with only video-level category … are supervised by the labels of videos. However, the objective for … Step-by-step [64] allows the model to learn more complete …