[PDF][PDF] The age of synthetic realities: Challenges and opportunities

JP Cardenuto, J Yang, R Padilha… - … on Signal and …, 2023 - nowpublishers.com
Synthetic realities are digital creations or augmentations that are contextually generated
through the use of Artificial Intelligence (AI) methods, leveraging extensive amounts of data …

Overview of voice conversion methods based on deep learning

T Walczyna, Z Piotrowski - Applied sciences, 2023 - mdpi.com
Voice conversion is a process where the essence of a speaker's identity is seamlessly
transferred to another speaker, all while preserving the content of their speech. This usage is …

Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

Voice conversion with just nearest neighbors

M Baas, B van Niekerk, H Kamper - arXiv preprint arXiv:2305.18975, 2023 - arxiv.org
Any-to-any voice conversion aims to transform source speech into a target voice with just a
few examples of the target speaker as a reference. Recent methods produce convincing …

Mlaad: The multi-language audio anti-spoofing dataset

NM Müller, P Kawa, WH Choong, E Casanova… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-Speech (TTS) technology brings significant advantages, such as giving a voice to
those with speech impairments, but also enables audio deepfakes and spoofs. The former …

Refxvc: Cross-lingual voice conversion with enhanced reference leveraging

M Zhang, Y Zhou, Y Ren, C Zhang… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
This paper proposes RefXVC, a method for crosslingual voice conversion (XVC) that
leverages reference information to improve conversion performance. Previous XVC works …

Fake the real: Backdoor attack on deep speech classification via voice conversion

Z Ye, T Mao, L Dong, D Yan - arXiv preprint arXiv:2306.15875, 2023 - arxiv.org
Deep speech classification has achieved tremendous success and greatly promoted the
emergence of many real-world applications. However, backdoor attacks present a new …

Openvoice: Versatile instant voice cloning

Z Qin, W Zhao, X Yu, X Sun - arXiv preprint arXiv:2312.01479, 2023 - arxiv.org
We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio
clip from the reference speaker to replicate their voice and generate speech in multiple …

StreamVC: Real-Time Low-Latency Voice Conversion

Y Yang, Y Kartynnik, Y Li, J Tang, X Li… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We present StreamVC, a streaming voice conversion solution that preserves the content and
prosody of any source speech while matching the voice timbre from any target speech …

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

Y Deng, H Tang, X Zhang, N Cheng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Voice conversion refers to transferring speaker identity with well-preserved content. Better
disentanglement of speech representations leads to better voice conversion. Recent studies …