M Zhang, E Tang, H Ding, Y Zhang - Journal of Speech, Language …, 2024 - pubs.asha.org
Purpose: As artificial intelligence (AI) takes an increasingly prominent role in health care, a growing body of research is being dedicated to its application in the investigation of …
The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data …
Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via …
While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired …
In this paper, we propose a novel unsupervised text-to-speech acoustic model training scheme, named UTTS, which does not require text-audio pairs. UTTS is a multi-speaker …
Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the …
Although humans engaged in face-to-face conversation simultaneously communicate both verbally and non-verbally methods for joint and unified synthesis of speech audio and co …
M Jeong, M Kim, BJ Choi, J Yoon… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
Though neural text-to-speech (TTS) models show remarkable performance, they still require a large amount of paired dataset, which is expensive to collect. The heavy demand for …
In this paper, we propose a novel unsupervised text-to-speech (UTTS) framework which does not require text-audio pairs for the TTS acoustic modeling (AM). UTTS is a multi …