A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

Cyclegan-vc3: Examining and improving cyclegan-vcs for mel-spectrogram conversion

T Kaneko, H Kameoka, K Tanaka, N Hojo - arXiv preprint arXiv …, 2020 - arxiv.org
Non-parallel voice conversion (VC) is a technique for learning mappings between source
and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial …

Any-to-many voice conversion with location-relative sequence-to-sequence modeling

S Liu, Y Cao, D Wang, X Wu, X Liu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq),
non-parallel voice conversion approach, which utilizes text supervision during training. In …

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining

WC Huang, T Hayashi, YC Wu, H Kameoka… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based
on the Transformer architecture with text-to-speech (TTS) pretraining. Seq2seq VC models …

Maskcyclegan-vc: Learning non-parallel voice conversion with filling in frames

T Kaneko, H Kameoka, K Tanaka… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Non-parallel voice conversion (VC) is a technique for training voice converters without a
parallel corpus. Cycle-consistent adversarial network-based VCs (CycleGAN-VC and …