Mass: Multi-task anthropomorphic speech synthesis framework

J Chen, L Ye, Z Ming - Computer Speech & Language, 2021 - Elsevier
Abstract Text-to-Speech (TTS) synthesis plays an important role in human-computer
interaction. Currently, most TTS technologies focus on the naturalness of speech, namely …

An analysis of performance evaluation metrics for voice conversion models

MT Akhter, P Banerjee, S Dhar… - 2022 IEEE 19th India …, 2022 - ieeexplore.ieee.org
The process of transforming a source speaker's vocal style or vocal feature to that of a target
speaker while keeping the linguistic information of the source speaker unchanged is known …

Any-to-Any Voice Conversion with Multi-layer Speaker Adaptation and Content Supervision

X Xu, L Shi, X Chen, P Lin, J Lian… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Any-to-any voice conversion can be performed among arbitrary speakers, even with a single
reference utterance. Many related studies have demonstrated that it can be effectively …

Time domain adversarial voice conversion for ADD 2022

C Wen, T Guo, X Tan, R Yan, S Zhou… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
In this paper, we describe our speech generation system for the first Audio Deep Synthesis
Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) …

Enhancing the intelligibility of cleft lip and palate speech using cycle-consistent adversarial networks

PN Sudro, RK Das, R Sinha… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Cleft lip and palate (CLP) refer to a congenital craniofacial condition that causes various
speech-related disorders. As a result of structural and functional deformities, the affected …

[图书][B] Game-theoretic approaches for generative modeling

J Gao - 2020 - search.proquest.com
Artificial intelligence has achieved great success in the past decade, but there remains a big
gap between machine and human intelligence. Most supervised approaches rely on large …

EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning

Z Liang, J Wang, X Zhang, Y Zhang, N Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre
for voice conversion has become a hot research topic. Existing works generally take into …

[PDF][PDF] Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion.

NJ Shah, MC Madhavi, HA Patil - INTERSPEECH, 2018 - researchgate.net
In the non-parallel Voice Conversion (VC) with the Iterative combination of Nearest
Neighbor search step and Conversion step Alignment (INCA) algorithm, the occurrence of …

A comparative analysis of latent regressor losses for singing voice conversion

B O'Connor, S Dixon - arXiv preprint arXiv:2302.13678, 2023 - arxiv.org
Previous research has shown that established techniques for spoken voice conversion (VC)
do not perform as well when applied to singing voice conversion (SVC). We propose an …

[引用][C] Non-Parallel Many-to-Many Voice Conversion with PSR-StarGAN.

Y Li, D Xu, Y Zhang, Y Wang, B Chen - Interspeech, 2020