- 学术资源搜索

A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org

Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

被引用次数：27 相关文章所有 4 个版本

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

被引用次数：776 相关文章所有 13 个版本

[PDF] neurips.cc

Egocentric video-language pretraining

KQ Lin, J Wang, M Soldan, M Wray… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Video-Language Pretraining (VLP), which aims to learn transferable representation
to advance a wide range of video-text downstream tasks, has recently received increasing …

被引用次数：143 相关文章所有 8 个版本

[PDF] thecvf.com

Univtg: Towards unified video-language temporal grounding

KQ Lin, P Zhang, J Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Video Temporal Grounding (VTG), which aims to ground target clips from videos
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …

被引用次数：62 相关文章所有 4 个版本

[PDF] neurips.cc

Momentdiff: Generative video moment retrieval from random to real

P Li, CW Xie, H Xie, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc

Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

被引用次数：39 相关文章所有 6 个版本

[PDF] arxiv.org

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：36 相关文章所有 8 个版本

[PDF] neurips.cc

Detecting moments and highlights in videos via natural language queries

J Lei, TL Berg, M Bansal - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Detecting customized moments and highlights from videos given natural language (NL) user
queries is an important but under-studied topic. One of the challenges in pursuing this …

被引用次数：171 相关文章所有 7 个版本

[PDF] thecvf.com

Query-dependent video representation for moment retrieval and highlight detection

WJ Moon, S Hyun, SU Park, D Park… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …

被引用次数：62 相关文章所有 5 个版本

[PDF] thecvf.com

Relaxed transformer decoders for direct action proposal generation

J Tan, J Tang, L Wang, G Wu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Temporal action proposal generation is an important and challenging task in video
understanding, which aims at detecting all temporal segments containing action instances of …

被引用次数：201 相关文章所有 6 个版本

[PDF] thecvf.com

Tubedetr: Spatio-temporal video grounding with transformers

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2022 - openaccess.thecvf.com

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a
given text query. This is a challenging task that requires the joint and efficient modeling of …

被引用次数：87 相关文章所有 10 个版本

高级搜索

QQ 群

A survey on video moment localization

Ego4d: Around the world in 3,000 hours of egocentric video

Egocentric video-language pretraining

Univtg: Towards unified video-language temporal grounding

Momentdiff: Generative video moment retrieval from random to real

Temporal sentence grounding in videos: A survey and future directions

Detecting moments and highlights in videos via natural language queries

Query-dependent video representation for moment retrieval and highlight detection

Relaxed transformer decoders for direct action proposal generation

Tubedetr: Spatio-temporal video grounding with transformers

引用