Who's speaking? Audio-supervised classification of active speakers in video

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

被引用次数：191 相关文章所有 5 个版本

[PDF] arxiv.org

Ava active speaker: An audio-visual dataset for active speaker detection

J Roth, S Chaudhuri, O Klejch, R Marvin… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Active speaker detection is an important component in video analysis algorithms for
applications such as speaker diarization, video re-targeting for meetings, speech …

被引用次数：193 相关文章所有 6 个版本

[PDF] thecvf.com

Active speakers in context

JL Alcázar, F Caba, L Mai, F Perazzi… - Proceedings of the …, 2020 - openaccess.thecvf.com

Current methods for active speaker detection focus on modeling audiovisual information
from a single speaker. This strategy can be adequate for addressing single-speaker …

被引用次数：98 相关文章所有 8 个版本

[PDF] thecvf.com

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com

In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

被引用次数：18 相关文章所有 7 个版本

[PDF] thecvf.com

How to design a three-stage architecture for audio-visual active speaker detection in the wild

O Köpüklü, M Taseska, G Rigoll - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Successful active speaker detection requires a three-stage pipeline:(i) audio-visual
encoding for all speakers in the clip,(ii) inter-speaker relation modeling between a reference …

被引用次数：57 相关文章所有 6 个版本

[PDF] thecvf.com

Maas: Multi-modal assignation for active speaker detection

JL Alcázar, F Caba, AK Thabet… - Proceedings of the …, 2021 - openaccess.thecvf.com

Active speaker detection requires a solid integration of multi-modal cues. While individual
modalities can approximate a solution, accurate predictions can only be achieved by …

被引用次数：60 相关文章所有 8 个版本

[PDF] arxiv.org

End-to-end active speaker detection

JL Alcázar, M Cordes, C Zhao, B Ghanem - European Conference on …, 2022 - Springer

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage
process: feature extraction and spatio-temporal context aggregation. In this paper, we …

被引用次数：33 相关文章所有 7 个版本

[PDF] arxiv.org

Cross-modal supervision for learning active speaker detection in video

P Chakravarty, T Tuytelaars - … Amsterdam, The Netherlands, October 11-14 …, 2016 - Springer

In this paper, we show how to use audio to supervise the learning of active speaker
detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based …

被引用次数：70 相关文章所有 8 个版本

[PDF] arxiv.org

Listen to look into the future: Audio-visual egocentric gaze anticipation

B Lai, F Ryan, W Jia, M Liu, JM Rehg - European Conference on Computer …, 2024 - Springer

Egocentric gaze anticipation serves as a key building block for the emerging capability of
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …

被引用次数：5 相关文章所有 2 个版本

[PDF] mdpi.com

Microphone array for speaker localization and identification in shared autonomous vehicles

I Marques, J Sousa, B Sá, D Costa, P Sousa, S Pereira… - Electronics, 2022 - mdpi.com

With the current technological transformation in the automotive industry, autonomous
vehicles are getting closer to the Society of Automative Engineers (SAE) automation level 5 …

被引用次数：22 相关文章所有 8 个版本

高级搜索

QQ 群