Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences. For movies, this presents …
We present a new loss function called Distribution-Balanced Loss for the multi-label recognition problems that exhibit long-tailed class distributions. Compared to conventional …
This paper attacks the challenging problem of video retrieval by text. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described …
S Liu, H Fan, S Qian, Y Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Video-Text Retrieval has been a hot research topic with the growth of multimedia data on the internet. Transformer for video-text learning has attracted increasing attention …
Recent years have seen remarkable advances in visual understanding. However, how to understand a story-based long video with artistic styles, eg movie, remains challenging. In …
The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form. Generating high-quality movie AD is challenging due to the …
CY Wu, P Krahenbuhl - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds. These systems understand the present …
MM Islam, G Bertasius - European Conference on Computer Vision, 2022 - Springer
Most modern video recognition models are designed to operate on short video clips (eg, 5– 10 s in length). Thus, it is challenging to apply such models to long movie understanding …
Media is created by humans for humans to tell stories. There exists a natural and imminent need for creating human-centered media analytics to illuminate the stories being told and to …