A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Overview of voice conversion methods based on deep learning

T Walczyna, Z Piotrowski - Applied Sciences, 2023 - mdpi.com
Voice conversion is a process where the essence of a speaker's identity is seamlessly
transferred to another speaker, all while preserving the content of their speech. This usage is …

Diffusion-based voice conversion with fast maximum likelihood sampling scheme

V Popov, I Vovk, V Gogoryan, T Sadekova… - arXiv preprint arXiv …, 2021 - arxiv.org
Voice conversion is a common speech synthesis task which can be solved in different ways
depending on a particular real-world scenario. The most challenging one often referred to as …

Freevc: Towards high-quality text-free one-shot voice conversion

J Li, W Tu, L Xiao - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Voice conversion (VC) can be achieved by first extracting source content information and
target speaker information, and then reconstructing waveform with these information …

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arXiv preprint arXiv …, 2023 - arxiv.org
Language models (LMs) have demonstrated the capability to handle a variety of generative
tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Make-a-voice: Unified voice synthesis with discrete representation

R Huang, C Zhang, Y Wang, D Yang, L Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Various applications of voice synthesis have been developed independently despite the fact
that they generate" voice" as output in common. In addition, the majority of voice synthesis …

Drvc: A framework of any-to-any voice conversion with self-supervised learning

Q Wang, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Any-to-any voice conversion problem aims to convert voices for source and target speakers,
which are out of the training data. Previous works wildly utilize the disentangle-based …

Disentangling content and fine-grained prosody information via hybrid asr bottleneck features for voice conversion

X Zhao, F Liu, C Song, Z Wu, S Kang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Non-parallel data voice conversion (VC) have achieved considerable breakthroughs
recently through introducing bottleneck features (BNFs) extracted by the automatic speech …

Voice conversion with just nearest neighbors

M Baas, B van Niekerk, H Kamper - arXiv preprint arXiv:2305.18975, 2023 - arxiv.org
Any-to-any voice conversion aims to transform source speech into a target voice with just a
few examples of the target speaker as a reference. Recent methods produce convincing …