Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

[HTML][HTML] Deep learning in the construction industry: A review of present status and future innovations

TD Akinosho, LO Oyedele, M Bilal, AO Ajayi… - Journal of Building …, 2020 - Elsevier
The construction industry is known to be overwhelmed with resource planning, risk
management and logistic challenges which often result in design defects, project delivery …

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Bigvgan: A universal neural vocoder with large-scale training

S Lee, W Ping, B Ginsburg, B Catanzaro… - arXiv preprint arXiv …, 2022 - arxiv.org
Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …

[PDF][PDF] Jukebox: A generative model for music

P Dhariwal, H Jun, C Payne, JW Kim… - arXiv preprint arXiv …, 2020 - assets.pubpub.org
We introduce Jukebox, a model that generates music with singing in the raw audio domain.
We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete …

Generalizing from a few examples: A survey on few-shot learning

Y Wang, Q Yao, JT Kwok, LM Ni - ACM computing surveys (csur), 2020 - dl.acm.org
Machine learning has been highly successful in data-intensive applications but is often
hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to …

Few-shot adversarial learning of realistic neural talking head models

E Zakharov, A Shysheya, E Burkov… - Proceedings of the …, 2019 - openaccess.thecvf.com
Several recent works have shown how highly realistic human head images can be obtained
by training convolutional neural networks to generate them. In order to create a personalized …

Joint audio-visual deepfake detection

Y Zhou, SN Lim - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
Abstract Deepfakes (" deep learning"+" fake") are synthetically-generated videos from AI
algorithms. While they could be entertaining, they could also be misused for falsifying …