[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild

A Nagrani, JS Chung, W Xie, A Zisserman - Computer Speech & Language, 2020 - Elsevier
The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …

Self-supervised learning of audio-visual objects from video

T Afouras, A Owens, JS Chung, A Zisserman - Computer Vision–ECCV …, 2020 - Springer
Our objective is to transform a video into a set of discrete audio-visual objects using self-
supervised learning. To this end, we introduce a model that uses attention to localize and …

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

Voxceleb: a large-scale speaker identification dataset

A Nagrani, JS Chung, A Zisserman - arXiv preprint arXiv:1706.08612, 2017 - arxiv.org
Most existing datasets for speaker identification contain samples obtained under quite
constrained conditions, and are usually hand-annotated, hence limited in size. The goal of …

Human movement datasets: An interdisciplinary scoping review

T Olugbade, M Bieńkiewicz, G Barbareschi… - ACM Computing …, 2022 - dl.acm.org
Movement dataset reviews exist but are limited in coverage, both in terms of size and
research discipline. While topic-specific reviews clearly have their merit, it is critical to have a …

Out of time: automated lip sync in the wild

JS Chung, A Zisserman - … Vision–ACCV 2016 Workshops: ACCV 2016 …, 2017 - Springer
The goal of this work is to determine the audio-video synchronisation between mouth motion
and speech in a video. We propose a two-stream ConvNet architecture that enables the …

Ava active speaker: An audio-visual dataset for active speaker detection

J Roth, S Chaudhuri, O Klejch, R Marvin… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Active speaker detection is an important component in video analysis algorithms for
applications such as speaker diarization, video re-targeting for meetings, speech …

A light weight model for active speaker detection

J Liao, H Duan, K Feng, W Zhao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Active speaker detection is a challenging task in audio-visual scenarios, with the aim to
detect who is speaking in one or more speaker scenarios. This task has received …

Active Speaker Detection using Audio, Visual and Depth Modalities: A Survey

SNAM Robi, MAZM Ariffin, MAM Izhar, N Ahmad… - IEEE …, 2024 - ieeexplore.ieee.org
The rapid progress of multimodal signal processing in recent years has cleared the way for
novel applications in human-computer interaction, surveillance, and telecommunication …

Learning long-term spatial-temporal graphs for active speaker detection

K Min, S Roy, S Tripathi, T Guha… - European Conference on …, 2022 - Springer
Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it
requires learning effective audiovisual features and spatial-temporal correlations over long …