E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero …
T Walczyna, Z Piotrowski - Applied sciences, 2023 - mdpi.com
Voice conversion is a process where the essence of a speaker's identity is seamlessly transferred to another speaker, all while preserving the content of their speech. This usage is …
Y Lu, J Chai, X Cao - ACM Transactions on Graphics (ToG), 2021 - dl.acm.org
To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system …
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many …
HS Choi, J Lee, W Kim, J Lee… - Advances in Neural …, 2021 - proceedings.neurips.cc
We present a neural analysis and synthesis (NANSY) framework that can manipulate the voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have …
We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings …
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to …
The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we …
While recent advances in deep neural networks have made it possible to render high-quality images, generating photo-realistic and personalized talking head remains challenging. With …