A graph-based framework to bridge movies and synopses

K Mangalam, R Akshulakov… - Advances in Neural …, 2023 - proceedings.neurips.cc

We introduce EgoSchema, a very long-form video question-answering dataset, and
benchmark to evaluate long video understanding capabilities of modern vision and …

被引用次数：153 相关文章所有 5 个版本

[PDF] thecvf.com

Autoad ii: The sequel-who, when, and what in movie audio description

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

Audio Description (AD) is the task of generating descriptions of visual content, at suitable
time intervals, for the benefit of visually impaired audiences. For movies, this presents …

被引用次数：39 相关文章所有 7 个版本

[PDF] arxiv.org

Distribution-balanced loss for multi-label classification in long-tailed datasets

T Wu, Q Huang, Z Liu, Y Wang, D Lin - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

We present a new loss function called Distribution-Balanced Loss for the multi-label
recognition problems that exhibit long-tailed class distributions. Compared to conventional …

被引用次数：292 相关文章所有 6 个版本

[PDF] arxiv.org

Dual encoding for video retrieval by text

J Dong, X Li, C Xu, X Yang, G Yang… - … on Pattern Analysis …, 2021 - ieeexplore.ieee.org

This paper attacks the challenging problem of video retrieval by text. In such a retrieval
paradigm, an end user searches for unlabeled videos by ad-hoc queries described …

被引用次数：233 相关文章所有 7 个版本

[PDF] thecvf.com

Hit: Hierarchical transformer with momentum contrast for video-text retrieval

S Liu, H Fan, S Qian, Y Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Video-Text Retrieval has been a hot research topic with the growth of multimedia
data on the internet. Transformer for video-text learning has attracted increasing attention …

被引用次数：181 相关文章所有 6 个版本

[PDF] arxiv.org

Movienet: A holistic dataset for movie understanding

Q Huang, Y Xiong, A Rao, J Wang, D Lin - Computer Vision–ECCV 2020 …, 2020 - Springer

Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …

被引用次数：262 相关文章所有 4 个版本

[PDF] thecvf.com

AutoAD: Movie description in context

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …

被引用次数：58 相关文章所有 7 个版本

[PDF] thecvf.com

Towards long-form video understanding

CY Wu, P Krahenbuhl - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com

Our world offers a never-ending stream of visual stimuli, yet today's vision systems only
accurately recognize patterns within a few seconds. These systems understand the present …

被引用次数：165 相关文章所有 12 个版本

[PDF] arxiv.org

Long movie clip classification with state-space video models

MM Islam, G Bertasius - European Conference on Computer Vision, 2022 - Springer

Most modern video recognition models are designed to operate on short video clips (eg, 5–
10 s in length). Thus, it is challenging to apply such models to long movie understanding …

被引用次数：95 相关文章所有 3 个版本

[PDF] ieee.org

Computational media intelligence: Human-centered machine analysis of media

K Somandepalli, T Guha, VR Martinez… - Proceedings of the …, 2021 - ieeexplore.ieee.org

Media is created by humans for humans to tell stories. There exists a natural and imminent
need for creating human-centered media analytics to illuminate the stories being told and to …

被引用次数：41 相关文章所有 4 个版本

高级搜索

QQ 群