A review on generative adversarial networks: Algorithms, theory, and applications

J Gui, Z Sun, Y Wen, D Tao, J Ye - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Generative adversarial networks (GANs) have recently become a hot research topic;
however, they have been studied since 2014, and a large number of algorithms have been …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

Autovc: Zero-shot voice style transfer with only autoencoder loss

K Qian, Y Zhang, S Chang, X Yang… - International …, 2019 - proceedings.mlr.press
Despite the progress in voice conversion, many-to-many voice conversion trained on non-
parallel data, as well as zero-shot voice conversion, remains under-explored. Deep style …

Cyclegan-vc2: Improved cyclegan-based non-parallel voice conversion

T Kaneko, H Kameoka, K Tanaka… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Non-parallel voice conversion (VC) is a technique for learning the mapping from source to
target speech without relying on parallel data. This is an important task, but it has been …

[HTML][HTML] Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

K Qian, Z Jin, M Hasegawa-Johnson… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Non-parallel many-to-many voice conversion remains an interesting but challenging speech
processing task. Many style-transfer-inspired methods such as generative adversarial …

Non-parallel sequence-to-sequence voice conversion with disentangled linguistic and speaker representations

JX Zhang, ZH Ling, LR Dai - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
This article presents a method of sequence-to-sequence (seq2seq) voice conversion using
non-parallel training data. In this method, disentangled linguistic and speaker …

Silent speech interfaces for speech restoration: A review

JA Gonzalez-Lopez, A Gomez-Alanis… - IEEE …, 2020 - ieeexplore.ieee.org
This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-
acoustic biosignals generated by the human body during speech production to enable …

Any-to-many voice conversion with location-relative sequence-to-sequence modeling

S Liu, Y Cao, D Wang, X Wu, X Liu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq),
non-parallel voice conversion approach, which utilizes text supervision during training. In …