Adaspeech 4: Adaptive text to speech in zero-shot scenarios

Y Wu, X Tan, B Li, L He, S Zhao, R Song, T Qin… - arXiv preprint arXiv …, 2022 - arxiv.org
Adaptive text to speech (TTS) can synthesize new voices in zero-shot scenarios efficiently,
by using a well-trained source TTS model without adapting it on the speech data of new …

Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech

SF Huang, CJ Lin, DR Liu, YC Chen… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Personalizing a speech synthesis system is a highly desired application, where the system
can generate speech with the user's voice with rare enrolled recordings. There are two main …

The multi-speaker multi-style voice cloning challenge 2021

Q Xie, X Tian, G Liu, K Song, L Xie, Z Wu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common
sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning …

Fully automated end-to-end fake audio detection

C Wang, J Yi, J Tao, H Sun, X Chen, Z Tian… - Proceedings of the 1st …, 2022 - dl.acm.org
The existing fake audio detection systems often rely on expert experience to design the
acoustic features or manually design the hyperparameters of the network structure …

nnspeech: Speaker-guided conditional variational autoencoder for zero-shot multi-speaker text-to-speech

B Zhao, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Multi-speaker text-to-speech (TTS) using a few adaption data is a challenge in practical
applications. To address that, we propose a zero-shot multi-speaker TTS, named nnSpeech …

Voice filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

A Gabryś, G Huybrechts, MS Ribeiro… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data
to generate high-quality synthetic speech. When using reduced amounts of training data …

Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems

E Eren, C Demiroglu - Computer Speech & Language, 2023 - Elsevier
Abstract End-to-end (e2e) speech synthesis systems have become popular with the recent
introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder …

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

SF Huang, CP Chen, ZS Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Personalized TTS is an exciting and highly desired application that allows users to train their
TTS voice using only a few recordings. However, TTS training typically requires many hours …