Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

Ava active speaker: An audio-visual dataset for active speaker detection

J Roth, S Chaudhuri, O Klejch, R Marvin… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Active speaker detection is an important component in video analysis algorithms for
applications such as speaker diarization, video re-targeting for meetings, speech …

Active speakers in context

JL Alcázar, F Caba, L Mai, F Perazzi… - Proceedings of the …, 2020 - openaccess.thecvf.com
Current methods for active speaker detection focus on modeling audiovisual information
from a single speaker. This strategy can be adequate for addressing single-speaker …

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

How to design a three-stage architecture for audio-visual active speaker detection in the wild

O Köpüklü, M Taseska, G Rigoll - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Successful active speaker detection requires a three-stage pipeline:(i) audio-visual
encoding for all speakers in the clip,(ii) inter-speaker relation modeling between a reference …

Maas: Multi-modal assignation for active speaker detection

JL Alcázar, F Caba, AK Thabet… - Proceedings of the …, 2021 - openaccess.thecvf.com
Active speaker detection requires a solid integration of multi-modal cues. While individual
modalities can approximate a solution, accurate predictions can only be achieved by …

End-to-end active speaker detection

JL Alcázar, M Cordes, C Zhao, B Ghanem - European Conference on …, 2022 - Springer
Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage
process: feature extraction and spatio-temporal context aggregation. In this paper, we …

Cross-modal supervision for learning active speaker detection in video

P Chakravarty, T Tuytelaars - … Amsterdam, The Netherlands, October 11-14 …, 2016 - Springer
In this paper, we show how to use audio to supervise the learning of active speaker
detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based …

Listen to look into the future: Audio-visual egocentric gaze anticipation

B Lai, F Ryan, W Jia, M Liu, JM Rehg - European Conference on Computer …, 2024 - Springer
Egocentric gaze anticipation serves as a key building block for the emerging capability of
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …

Microphone array for speaker localization and identification in shared autonomous vehicles

I Marques, J Sousa, B Sá, D Costa, P Sousa, S Pereira… - Electronics, 2022 - mdpi.com
With the current technological transformation in the automotive industry, autonomous
vehicles are getting closer to the Society of Automative Engineers (SAE) automation level 5 …