Neural voice cloning with a few samples

S Arik, J Chen, K Peng, W Ping… - Advances in neural …, 2018 - proceedings.neurips.cc
Advances in neural information processing systems, 2018proceedings.neurips.cc
Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a
neural voice cloning system that learns to synthesize a person's voice from only a few audio
samples. We study two approaches: speaker adaptation and speaker encoding. Speaker
adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is
based on training a separate model to directly infer a new speaker embedding, which will be
applied to a multi-speaker generative model. In terms of naturalness of the speech and …
Abstract
Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. In terms of naturalness of the speech and similarity to the original speaker, both approaches can achieve good performance, even with a few cloning audios. While speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.
proceedings.neurips.cc
以上显示的是最相近的搜索结果。 查看全部搜索结果