Active speaker detection with audio-visual co-training

P Chakravarty, J Zegers, T Tuytelaars… - Proceedings of the 18th …, 2016 - dl.acm.org
In this work, we show how to co-train a classifier for active speaker detection using audio-
visual data. First, audio Voice Activity Detection (VAD) is used to train a personalized video …

Cross-modal supervision for learning active speaker detection in video

P Chakravarty, T Tuytelaars - … Amsterdam, The Netherlands, October 11-14 …, 2016 - Springer
In this paper, we show how to use audio to supervise the learning of active speaker
detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based …

Audio-visual activity guided cross-modal identity association for active speaker detection

R Sharma, S Narayanan - IEEE Open Journal of Signal …, 2023 - ieeexplore.ieee.org
Active speaker detection in videos addresses associating a source face, visible in the video
frames, with the underlying speech in the audio modality. The two primary sources of …

Who's speaking? Audio-supervised classification of active speakers in video

P Chakravarty, S Mirzaei, T Tuytelaars… - Proceedings of the …, 2015 - dl.acm.org
Active speakers have traditionally been identified in video by detecting their moving lips.
This paper demonstrates the same using spatio-temporal features that aim to capture other …

Maas: Multi-modal assignation for active speaker detection

JL Alcázar, F Caba, AK Thabet… - Proceedings of the …, 2021 - openaccess.thecvf.com
Active speaker detection requires a solid integration of multi-modal cues. While individual
modalities can approximate a solution, accurate predictions can only be achieved by …

Unsupervised active speaker detection in media content using cross-modal information

R Sharma, S Narayanan - arXiv preprint arXiv:2209.11896, 2022 - arxiv.org
We present a cross-modal unsupervised framework for active speaker detection in media
content such as TV shows and movies. Machine learning advances have enabled …

Ava active speaker: An audio-visual dataset for active speaker detection

J Roth, S Chaudhuri, O Klejch, R Marvin… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Active speaker detection is an important component in video analysis algorithms for
applications such as speaker diarization, video re-targeting for meetings, speech …

Active speakers in context

JL Alcázar, F Caba, L Mai, F Perazzi… - Proceedings of the …, 2020 - openaccess.thecvf.com
Current methods for active speaker detection focus on modeling audiovisual information
from a single speaker. This strategy can be adequate for addressing single-speaker …

Learning spatial-temporal graphs for active speaker detection

S Roy, K Min, S Tripathi, T Guha… - arXiv preprint arXiv …, 2021 - arxiv.org
We address the problem of active speaker detection through a new framework, called
SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship …

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …