Moviechat: From dense token to sparse memory for long video understanding

E Song, W Chai, G Wang, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recently integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …

Findings of the 2019 conference on machine translation (WMT19)

L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch
This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …

Findings of the 2021 conference on machine translation (WMT21)

F Akhbardeh, A Arkhangorodsky, M Biesialska… - Proceedings of the sixth …, 2021 - cris.fbk.eu
This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …

[图书][B] Fundamentals of multimedia

ZN Li, MS Drew, J Liu - 2004 - Springer
In the 17 years since the first edition of Fundamentals of Multimedia, the field and
applications of multimedia have flourished and are undergoing evermore rapid growth and …

Is the reign of interactive search eternal? findings from the video browser showdown 2020

J Lokoč, P Veselý, F Mejzlík, G Kovalčík… - ACM Transactions on …, 2021 - dl.acm.org
Comprehensive and fair performance evaluation of information retrieval systems represents
an essential task for the current information age. Whereas Cranfield-based evaluations with …

A comprehensive review of the video-to-text problem

J Perez-Martin, B Bustos, SJF Guimaraes… - Artificial Intelligence …, 2022 - Springer
Research in the Vision and Language area encompasses challenging topics that seek to
connect visual and textual information. When the visual information is related to videos, this …

SEA: Sentence encoder assembly for video retrieval by textual queries

X Li, F Zhou, C Xu, J Ji, G Yang - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a
core theme in multimedia data management and retrieval. The success of AVS counts on …

MultiVENT: Multilingual Videos of Events and Aligned Natural Text

K Sanders, D Etter, R Kriz… - Advances in Neural …, 2023 - proceedings.neurips.cc
Everyday news coverage has shifted from traditional broadcasts towards a wide range of
presentation formats such as first-hand, unedited video footage. Datasets that reflect the …

Considering human perception and memory in interactive multimedia retrieval evaluations

L Rossetto, W Bailer, A Bernstein - International Conference on Multimedia …, 2021 - Springer
Experimental evaluations dealing with visual known-item search tasks, where real users
look for previously observed and memorized scenes in a given video collection, represent a …

Face, body, voice: Video person-clustering with multiple modalities

A Brown, V Kalogeiton… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
The objective of this work is person-clustering in videos--grouping characters according to
their identity. Previous methods focus on the narrower task of face-clustering, and for the …