查看文章

arxiv.org 中的 [PDF]

Meta-TTS: Meta-learning for few-shot speaker adaptive text-to-speech

作者

Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-yi Lee

发表日期

2022/4/13

期刊

IEEE/ACM Transactions on Audio, Speech, and Language Processing

卷号

页码范围

1558-1571

出版商

IEEE

简介

Personalizing a speech synthesis system is a highly desired application, where the system can generate speech with the user’s voice with rare enrolled recordings. There are two main approaches to build such a system in recent works: speaker adaptation and speaker encoding. On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples. However, they require at least thousands of fine-tuning steps for high-quality adaptation, making it hard to apply on devices. On the other hand, speaker encoding methods encode enrollment utterances into a speaker embedding. The trained TTS model can synthesize the user’s speech conditioned on the corresponding speaker embedding. Nevertheless, the speaker encoder suffers from the generalization gap between the seen and unseen speakers. In this paper, we propose applying a meta-learning …

引用总数

被引用次数：56

20222023202412 29 15

学术搜索中的文章

Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech

SF Huang, CJ Lin, DR Liu, YC Chen, H Lee - IEEE/ACM Transactions on Audio, Speech, and …, 2022

被引用次数：56 相关文章所有 5 个版本