查看文章

Mega-tts 2: Zero-shot text-to-speech with arbitrary length speech prompts

作者

Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

发表日期

2023/7/14

期刊

ICLR 2024

简介

Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous large-scale multispeaker TTS models have successfully achieved this goal with an enrolled recording within 10 seconds. However, most of them are designed to utilize only short speech prompts. The limited information in short speech prompts significantly hinders the performance of fine-grained identity imitation. In this paper, we introduce Mega-TTS 2, a generic zero-shot multispeaker TTS model that is capable of synthesizing speech for unseen speakers with arbitrary-length prompts. Specifically, we 1) design a multi-reference timbre encoder to extract timbre information from multiple reference speeches; 2) and train a prosody language model with arbitrary-length speech prompts; With these designs, our model is suitable for prompts of different lengths, which extends the upper bound of speech quality for zero-shot text-to-speech. Besides arbitrary-length prompts, we introduce arbitrary-source prompts, which leverages the probabilities derived from multiple P-LLM outputs to produce expressive and controlled prosody. Furthermore, we propose a phoneme-level auto-regressive duration model to introduce in-context learning capabilities to duration modeling. Experiments demonstrate that our method could not only synthesize identity-preserving speech with a short prompt of an unseen speaker but also achieve improved performance with longer speech prompts. Audio samples can be found in https://mega-tts.github.io/mega2_demo/.

引用总数

被引用次数：34

202320243 31

学术搜索中的文章

Mega-tts 2: Zero-shot text-to-speech with arbitrary length speech prompts

Z Jiang, J Liu, Y Ren, J He, C Zhang, Z Ye, P Wei… - arXiv preprint arXiv:2307.07218, 2023

被引用次数：21 相关文章所有 2 个版本

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis*

Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang, C Zhang… - The Twelfth International Conference on Learning …, 2024

Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang, C Zhang… - The Twelfth International Conference on Learning …, 2023

被引用次数：6 相关文章