Neural voice cloning with a few samples

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

被引用次数：391 相关文章所有 11 个版本

[HTML] sciencedirect.com

[HTML][HTML] Deep learning in the construction industry: A review of present status and future innovations

TD Akinosho, LO Oyedele, M Bilal, AO Ajayi… - Journal of Building …, 2020 - Elsevier

The construction industry is known to be overwhelmed with resource planning, risk
management and logistic challenges which often result in design defects, project delivery …

被引用次数：468 相关文章所有 9 个版本

[PDF] arxiv.org

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

被引用次数：592 相关文章所有 3 个版本

[PDF] mlr.press

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

被引用次数：423 相关文章所有 7 个版本

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：455 相关文章所有 2 个版本

[PDF] arxiv.org

Bigvgan: A universal neural vocoder with large-scale training

S Lee, W Ping, B Ginsburg, B Catanzaro… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …

被引用次数：213 相关文章所有 5 个版本

[PDF] pubpub.org

[PDF][PDF] Jukebox: A generative model for music

P Dhariwal, H Jun, C Payne, JW Kim… - arXiv preprint arXiv …, 2020 - assets.pubpub.org

We introduce Jukebox, a model that generates music with singing in the raw audio domain.
We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete …

被引用次数：881 相关文章所有 8 个版本

[PDF] arxiv.org

Generalizing from a few examples: A survey on few-shot learning

Y Wang, Q Yao, JT Kwok, LM Ni - ACM computing surveys (csur), 2020 - dl.acm.org

Machine learning has been highly successful in data-intensive applications but is often
hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to …

被引用次数：3573 相关文章所有 11 个版本

[PDF] thecvf.com

Few-shot adversarial learning of realistic neural talking head models

E Zakharov, A Shysheya, E Burkov… - Proceedings of the …, 2019 - openaccess.thecvf.com

Several recent works have shown how highly realistic human head images can be obtained
by training convolutional neural networks to generate them. In order to create a personalized …

被引用次数：778 相关文章所有 17 个版本

[PDF] thecvf.com

Joint audio-visual deepfake detection

Y Zhou, SN Lim - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com

Abstract Deepfakes (" deep learning"+" fake") are synthetically-generated videos from AI
algorithms. While they could be entertaining, they could also be misused for falsifying …

被引用次数：176 相关文章所有 5 个版本

高级搜索

QQ 群