Amphion: An open-source audio, music and speech generation toolkit

X Zhang, L Xue, Y Gu, Y Wang, J Li, H He… - arXiv preprint arXiv …, 2023 - arxiv.org
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to
ease the way for junior researchers and engineers into these fields. It presents a unified …

CoMoSVC: Consistency Model-based Singing Voice Conversion

Y Lu, Z Ye, W Xue, X Tan, Q Liu, Y Guo - arXiv preprint arXiv:2401.01792, 2024 - arxiv.org
The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable
performances, producing natural audios with high similarity to the target timbre. However …

SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion

L Xue, C Wang, M Wang, X Zhang, J Han… - arXiv preprint arXiv …, 2024 - arxiv.org
In this study, we present SingVisio, an interactive visual analysis system that aims to explain
the diffusion model used in singing voice conversion. SingVisio provides a visual display of …

HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts

X Niu, J Zhang, CP Martin - arXiv preprint arXiv:2404.15637, 2024 - arxiv.org
We introduce HybridVC, a voice conversion (VC) framework built upon a pre-trained
conditional variational autoencoder (CVAE) that combines the strengths of a latent model …

Zero-shot Voice Conversion with Diffusion Transformers

S Liu - arXiv preprint arXiv:2411.09943, 2024 - arxiv.org
Zero-shot voice conversion aims to transform a source speech utterance to match the timbre
of a reference speech from an unseen speaker. Traditional approaches struggle with timbre …

RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion

W Chen, X Zhao, J Chen, B Sha… - 2024 IEEE 14th …, 2024 - ieeexplore.ieee.org
Singing voice conversion (SVC) is hindered by noise sensitivity due to the use of non-robust
methods for extracting pitch and energy during the inference. As clean signals are key for …

SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset

Y Gu, C Wang, J Zhang, X Zhang, Z Fang, H He, Z Wu - openreview.net
The lack of a publicly-available large-scale and diverse dataset has long been a significant
bottleneck for singing voice applications like Singing Voice Synthesis (SVS) and Singing …