Trecvid 2019: An evaluation campaign to benchmark video activity detection, video captioning...

E Song, W Chai, G Wang, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recently integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …

被引用次数：150 相关文章所有 3 个版本

[PDF] uzh.ch

Findings of the 2019 conference on machine translation (WMT19)

L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch

This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …

被引用次数：767 相关文章所有 13 个版本

[PDF] fbk.eu

Findings of the 2021 conference on machine translation (WMT21)

F Akhbardeh, A Arkhangorodsky, M Biesialska… - Proceedings of the sixth …, 2021 - cris.fbk.eu

This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …

被引用次数：191 相关文章所有 19 个版本

[PDF] academia.edu

[图书][B] Fundamentals of multimedia

ZN Li, MS Drew, J Liu - 2004 - Springer

In the 17 years since the first edition of Fundamentals of Multimedia, the field and
applications of multimedia have flourished and are undergoing evermore rapid growth and …

被引用次数：651 相关文章所有 15 个版本

[PDF] itu.dk

Is the reign of interactive search eternal? findings from the video browser showdown 2020

J Lokoč, P Veselý, F Mejzlík, G Kovalčík… - ACM Transactions on …, 2021 - dl.acm.org

Comprehensive and fair performance evaluation of information retrieval systems represents
an essential task for the current information age. Whereas Cranfield-based evaluations with …

被引用次数：48 相关文章所有 7 个版本

[PDF] arxiv.org

A comprehensive review of the video-to-text problem

J Perez-Martin, B Bustos, SJF Guimaraes… - Artificial Intelligence …, 2022 - Springer

Research in the Vision and Language area encompasses challenging topics that seek to
connect visual and textual information. When the visual information is related to videos, this …

被引用次数：17 相关文章所有 8 个版本

[PDF] arxiv.org

SEA: Sentence encoder assembly for video retrieval by textual queries

X Li, F Zhou, C Xu, J Ji, G Yang - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a
core theme in multimedia data management and retrieval. The success of AVS counts on …

被引用次数：57 相关文章所有 3 个版本

[PDF] neurips.cc

MultiVENT: Multilingual Videos of Events and Aligned Natural Text

K Sanders, D Etter, R Kriz… - Advances in Neural …, 2023 - proceedings.neurips.cc

Everyday news coverage has shifted from traditional broadcasts towards a wide range of
presentation formats such as first-hand, unedited video footage. Datasets that reflect the …

被引用次数：7 相关文章所有 6 个版本

[PDF] uzh.ch

Considering human perception and memory in interactive multimedia retrieval evaluations

L Rossetto, W Bailer, A Bernstein - International Conference on Multimedia …, 2021 - Springer

Experimental evaluations dealing with visual known-item search tasks, where real users
look for previously observed and memorized scenes in a given video collection, represent a …

被引用次数：42 相关文章所有 8 个版本

[PDF] thecvf.com

Face, body, voice: Video person-clustering with multiple modalities

A Brown, V Kalogeiton… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

The objective of this work is person-clustering in videos--grouping characters according to
their identity. Previous methods focus on the narrower task of face-clustering, and for the …

被引用次数：34 相关文章所有 17 个版本

高级搜索

QQ 群