A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions

S Ji, J Luo, X Yang - arXiv preprint arXiv:2011.06801, 2020 - arxiv.org
The utilization of deep learning techniques in generating various contents (such as image,
text, etc.) has become a trend. Especially music, the topic of this paper, has attracted …

Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

Audio deepfakes: A survey

Z Khanjani, G Watson, VP Janeja - Frontiers in Big Data, 2023 - frontiersin.org
A deepfake is content or material that is synthetically generated or manipulated using
artificial intelligence (AI) methods, to be passed off as real and can include audio, video …

Transforming spectrum and prosody for emotional voice conversion with non-parallel training data

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2002.00198, 2020 - arxiv.org
Emotional voice conversion aims to convert the spectrum and prosody to change the
emotional patterns of speech, while preserving the speaker identity and linguistic content …

Diffsvc: A diffusion probabilistic model for singing voice conversion

S Liu, Y Cao, D Su, H Meng - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Singing voice conversion (SVC) is one promising technique that can enrich the way of
human-computer interaction by en-dowing a computer the ability to produce high-fidelity and …

Vaw-gan for disentanglement and recomposition of emotional elements in speech

K Zhou, B Sisman, H Li - 2021 IEEE spoken language …, 2021 - ieeexplore.ieee.org
Emotional voice conversion (EVC) aims to convert the emotion of speech from one state to
another while preserving the linguistic content and speaker identity. In this paper, we study …

A review of differentiable digital signal processing for music and speech synthesis

B Hayes, J Shier, G Fazekas, A McPherson… - Frontiers in Signal …, 2024 - frontiersin.org
The term “differentiable digital signal processing” describes a family of techniques in which
loss function gradients are backpropagated through digital signal processors, facilitating …

Fastsvc: Fast cross-domain singing voice conversion with feature-wise linear modulation

S Liu, Y Cao, N Hu, D Su… - 2021 ieee international …, 2021 - ieeexplore.ieee.org
This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC)
system, which can achieve high conversion performance, with inference speed 4x faster …

Singing voice conversion with disentangled representations of singer and vocal technique using variational autoencoders

YJ Luo, CC Hsu, K Agres… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
We propose a flexible framework that deals with both singer conversion and singers vocal
technique conversion. The proposed model is trained on non-parallel corpora …