Autovc: Zero-shot voice style transfer with only autoencoder loss

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

被引用次数：321 相关文章所有 11 个版本

[PDF] mlr.press

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

被引用次数：339 相关文章所有 7 个版本

[PDF] mdpi.com

Overview of voice conversion methods based on deep learning

T Walczyna, Z Piotrowski - Applied sciences, 2023 - mdpi.com

Voice conversion is a process where the essence of a speaker's identity is seamlessly
transferred to another speaker, all while preserving the content of their speech. This usage is …

被引用次数：25 相关文章所有 5 个版本

[PDF] acm.org

Live speech portraits: real-time photorealistic talking-head animation

Y Lu, J Chai, X Cao - ACM Transactions on Graphics (ToG), 2021 - dl.acm.org

To the best of our knowledge, we first present a live system that generates personalized
photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system …

被引用次数：162 相关文章所有 4 个版本

[PDF] mit.edu

Deep learning for text style transfer: A survey

D Jin, Z Jin, Z Hu, O Vechtomova… - Computational …, 2022 - direct.mit.edu

Text style transfer is an important task in natural language generation, which aims to control
certain attributes in the generated text, such as politeness, emotion, humor, and many …

被引用次数：265 相关文章所有 9 个版本

[PDF] neurips.cc

Neural analysis and synthesis: Reconstructing speech from self-supervised representations

HS Choi, J Lee, W Kim, J Lee… - Advances in Neural …, 2021 - proceedings.neurips.cc

We present a neural analysis and synthesis (NANSY) framework that can manipulate the
voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have …

被引用次数：146 相关文章所有 6 个版本

[PDF] acm.org

Makelttalk: speaker-aware talking-head animation

Y Zhou, X Han, E Shechtman, J Echevarria… - ACM Transactions On …, 2020 - dl.acm.org

We present a method that generates expressive talking-head videos from a single facial
image with audio as the only input. In contrast to previous attempts to learn direct mappings …

被引用次数：410 相关文章所有 4 个版本

[PDF] mlr.press

Contentvec: An improved self-supervised speech representation by disentangling speakers

K Qian, Y Zhang, H Gao, J Ni, CI Lai… - International …, 2022 - proceedings.mlr.press

Self-supervised learning in speech involves training a speech representation network on a
large-scale unannotated speech corpus, and then applying the learned representations to …

被引用次数：106 相关文章所有 8 个版本

[PDF] arxiv.org

Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Y Zhao, WC Huang, X Tian, J Yamagishi… - arXiv preprint arXiv …, 2020 - arxiv.org

The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …

被引用次数：230 相关文章所有 10 个版本

[PDF] arxiv.org

Dfa-nerf: Personalized talking head generation via disentangled face attributes neural rendering

S Yao, RZ Zhong, Y Yan, G Zhai, X Yang - arXiv preprint arXiv:2201.00791, 2022 - arxiv.org

While recent advances in deep neural networks have made it possible to render high-quality
images, generating photo-realistic and personalized talking head remains challenging. With …

被引用次数：94 相关文章所有 2 个版本

高级搜索

QQ 群