Huge amounts of unstructured data including image, video, audio, and text are ubiquitously generated and shared, and it is a challenge to protect sensitive personal information in …
摘要This paper introduces a new method to extract speaker embed-dings from a deep neural network (DNN) for text-independent speaker verification. Usually, speaker …
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero …
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple …
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …
The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster …
Current speaker verification techniques rely on a neural network to extract speaker representations. The successful x-vector architecture is a Time Delay Neural Network …
ASVspoof 2021 is the forth edition in the series of biannual challenges which aim to promote the study of spoofing and the design of countermeasures to protect automatic speaker …