The rapid progress of multimodal signal processing in recent years has cleared the way for novel applications in human-computer interaction, surveillance, and telecommunication …
We address the challenging Voice Activity Detection (VAD) problem, which determines" Who is Speaking and When?" in audiovisual recordings. The typical audio-based VAD systems …
We propose a new efficient framework, the Unified Context Network (UniCon), for robust active speaker detection (ASD). Traditional methods for ASD usually operate on each …
JV Quiros, C Raman, S Tan, E Gedik… - arXiv preprint arXiv …, 2024 - arxiv.org
Recognizing speaking in humans is a central task towards understanding social interactions. Ideally, speaking would be detected from individual voice recordings, as done …
Listeners use short interjections, so-called backchannels, to signify attention or express agreement. The automatic analysis of this behavior is of key importance for human …
Multimodal analysis of group behavior is a key task in human-computer interaction, and in the social and behavioral sciences, but is often limited to more easily controllable laboratory …
C Yang, M Chen, Y Wang, Y Wang - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Audio-visual speaker diarization refers to the task of identifying" who spoke when" by using both audio and video data. Although previous fusion-based approaches have shown …
Recognizing who is speaking in a crowded scene is a key challenge towards the understanding of the social interactions going on within. Detecting speaking status from …
C Raman, J Vargas Quiros, S Tan… - Advances in …, 2022 - proceedings.neurips.cc
Recording the dynamics of unscripted human interactions in the wild is challenging due to the delicate trade-offs between several factors: participant privacy, ecological validity, data …