[PDF][PDF] Novel Inception-GAN for Whisper-to-Normal speech conversion

M Patel, M Parmar, S Doshi, N Shah… - Proceedings of 10th ISCA …, 2019 - academia.edu
Abstract Recently, Convolutional Neural Networks (CNN)-based Generative Adversarial
Networks (GANs) are used for Whisper-to-Normal Speech (ie, WHSP2SPCH) conversion …

Voice conversion using conditional CycleGAN

D Yook, IC Yoo, S Yoo - 2018 International Conference on …, 2018 - ieeexplore.ieee.org
Voice conversion (VC) modifies characteristics of speech, such as gender and speaker
identities. The VC can be applied to various tasks including speaking assistance and …

Non-parallel articulatory-to-acoustic conversion using multiview-based time warping

JA Gonzalez-Lopez, A Gomez-Alanis… - Applied Sciences, 2022 - mdpi.com
In this paper, we propose a novel algorithm called multiview temporal alignment by
dependence maximisation in the latent space (TRANSIENCE) for the alignment of time …

Enhancing the intelligibility of cleft lip and palate speech using cycle-consistent adversarial networks

PN Sudro, RK Das, R Sinha… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Cleft lip and palate (CLP) refer to a congenital craniofacial condition that causes various
speech-related disorders. As a result of structural and functional deformities, the affected …

Mass: Multi-task anthropomorphic speech synthesis framework

J Chen, L Ye, Z Ming - Computer Speech & Language, 2021 - Elsevier
Abstract Text-to-Speech (TTS) synthesis plays an important role in human-computer
interaction. Currently, most TTS technologies focus on the naturalness of speech, namely …

Game-theoretic approaches for generative modeling

J Gao - 2020 - search.proquest.com
Artificial intelligence has achieved great success in the past decade, but there remains a big
gap between machine and human intelligence. Most supervised approaches rely on large …

Any-to-Any Voice Conversion with Multi-layer Speaker Adaptation and Content Supervision

X Xu, L Shi, X Chen, P Lin, J Lian… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Any-to-any voice conversion can be performed among arbitrary speakers, even with a single
reference utterance. Many related studies have demonstrated that it can be effectively …

Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions

S Ghosh, S Sarkar, S Ghosh, F Zalkow, ND Jana - Applied Intelligence, 2024 - Springer
Audio-visual speech synthesis (AVSS) has garnered attention in recent years for its utility in
the realm of audio-visual learning. AVSS transforms one speaker's speech into another's …

Voicy: Zero-shot non-parallel voice conversion in noisy reverberant environments

A Mottini, J Lorenzo-Trueba, SVK Karlapati… - arXiv preprint arXiv …, 2021 - arxiv.org
Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of
a source utterance to change the perceived identity of the speaker. While there is a rich …

Learning structured sparse representations for voice conversion

S Ding, G Zhao, C Liberatore… - … /ACM Transactions on …, 2019 - ieeexplore.ieee.org
Sparse-coding techniques for voice conversion assume that an utterance can be
decomposed into a sparse code that only carries linguistic contents, and a dictionary of …