Unsupervised representation learning for speech activity detection in the fearless steps challenge 2021

P Gimeno, A Ortega, A Miguel, E Lleida - Interspeech 2021, 2021 - hal.science
In this paper, we describe the ViVoLab speech activity detection (SAD) system submitted to
the Fearless steps Challengephase III. This series of challenges have proposed a number of …

Multimodal diarization systems by training enrollment models as identity representations

V Mingote, I Viñals, P Gimeno, A Miguel, A Ortega… - Applied Sciences, 2022 - mdpi.com
This paper describes a post-evaluation analysis of the system developed by ViVoLAB
research group for the IberSPEECH-RTVE 2020 Multimodal Diarization (MD) Challenge …

The domain mismatch problem in the broadcast speaker attribution task

I Viñals, A Ortega, A Miguel, E Lleida - Applied Sciences, 2021 - mdpi.com
The demand of high-quality metadata for the available multimedia content requires the
development of new techniques able to correctly identify more and more information …

[PDF][PDF] Advances in Binary and Multiclass Audio Segmentation with Deep Learning Techniques: A PhD Thesis Overview

P Gimeno, A Ortega - Proc. IberSPEECH 2024, 2024 - isca-archive.org
Advances in technology have increased multimedia data generation, making manual
analysis impractical and driving the need for automatic tools, often based on deep learning …

Unsupervised adaptation of deep speech activity detection models to unseen domains

P Gimeno, D Ribas, A Ortega, A Miguel, E Lleida - Applied Sciences, 2022 - mdpi.com
Speech Activity Detection (SAD) aims to accurately classify audio fragments containing
human speech. Current state-of-the-art systems for the SAD task are mainly based on deep …

[PDF][PDF] Advances in Binary and Multiclass Audio Segmentation with Deep Learning Techniques

PG Jordán, AO Giménez - 2023 - researchgate.net
Advances in technology over the last decade have reshaped the way population interact
with multimedia content. This fact aroused a significant rise both in generation and …

[PDF][PDF] Representation and metric learning advances for deep neural network face and speaker biometric systems

VM Bueno - 2022 - researchgate.net
The increasing use of technological devices and biometric recognition systems in people
daily lives has motivated a great deal of research interest in the development of effective and …

Advances in Binary and Multiclass Audio Segmentation with Deep Learning Techniques

P Gimeno Jordán, A Ortega Giménez - zaguan.unizar.es
Los avances tecnológicos acaecidos en la última década han cambiado completamente la
forma en la que la población interactúa con el contenido multimedia. Esto ha propiciado un …

EML Online Speech Activity Detection for the Fearless Steps Challenge Phase-III

O Ghahabi, V Fischer - arXiv preprint arXiv:2106.11075, 2021 - arxiv.org
Speech Activity Detection (SAD), locating speech segments within an audio recording, is a
main part of most speech technology applications. Robust SAD is usually more difficult in …