This paper describes a post-evaluation analysis of the system developed by ViVoLAB research group for the IberSPEECH-RTVE 2020 Multimodal Diarization (MD) Challenge …
The demand of high-quality metadata for the available multimedia content requires the development of new techniques able to correctly identify more and more information …
P Gimeno, A Ortega - Proc. IberSPEECH 2024, 2024 - isca-archive.org
Advances in technology have increased multimedia data generation, making manual analysis impractical and driving the need for automatic tools, often based on deep learning …
Speech Activity Detection (SAD) aims to accurately classify audio fragments containing human speech. Current state-of-the-art systems for the SAD task are mainly based on deep …
Advances in technology over the last decade have reshaped the way population interact with multimedia content. This fact aroused a significant rise both in generation and …
The increasing use of technological devices and biometric recognition systems in people daily lives has motivated a great deal of research interest in the development of effective and …
Los avances tecnológicos acaecidos en la última década han cambiado completamente la forma en la que la población interactúa con el contenido multimedia. Esto ha propiciado un …
O Ghahabi, V Fischer - arXiv preprint arXiv:2106.11075, 2021 - arxiv.org
Speech Activity Detection (SAD), locating speech segments within an audio recording, is a main part of most speech technology applications. Robust SAD is usually more difficult in …