A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Y Kumar, A Koul, C Singh - Multimedia Tools and Applications, 2023 - Springer
Text-to-speech systems (TTS) have come a long way in the last decade and are now a
popular research topic for creating various human-computer interaction systems. Although, a …

Artificial intelligence and the future of communication sciences and disorders: A bibliometric and visualization analysis

M Zhang, E Tang, H Ding, Y Zhang - Journal of Speech, Language …, 2024 - pubs.asha.org
Purpose: As artificial intelligence (AI) takes an increasingly prominent role in health care, a
growing body of research is being dedicated to its application in the investigation of …

Simple and effective unsupervised speech translation

C Wang, H Inaguma, PJ Chen, I Kulikov, Y Tang… - arXiv preprint arXiv …, 2022 - arxiv.org
The amount of labeled data to train models for speech tasks is limited for most languages,
however, the data scarcity is exacerbated for speech translation which requires labeled data …

Self-supervised asr models and features for dysarthric and elderly speech recognition

S Hu, X Xie, M Geng, Z Jin, J Deng, G Li… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Self-supervised learning (SSL) based speech foundation models have been applied to a
wide range of ASR tasks. However, their application to dysarthric and elderly speech via …

Learning to speak from text: Zero-shot multilingual text-to-speech with unsupervised text pretraining

T Saeki, S Maiti, X Li, S Watanabe, S Takamichi… - arXiv preprint arXiv …, 2023 - arxiv.org
While neural text-to-speech (TTS) has achieved human-like natural synthetic speech,
multilingual TTS systems are limited to resource-rich languages due to the need for paired …

Unsupervised tts acoustic modeling for tts with conditional disentangled sequential vae

J Lian, C Zhang, GK Anumanchipalli… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
In this paper, we propose a novel unsupervised text-to-speech acoustic model training
scheme, named UTTS, which does not require text-audio pairs. UTTS is a multi-speaker …

Improvement in automatic speech recognition of south asian accent using transfer learning of deepspeech2

MA Hassan, A Rehmat, MU Ghani Khan… - Mathematical …, 2022 - Wiley Online Library
Automatic speech recognition (ASR) has ensured a convenient and fast mode of
communication between humans and computers. It has become more accurate over the …

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

S Mehta, A Deichler, J O'regan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Although humans engaged in face-to-face conversation simultaneously communicate both
verbally and non-verbally methods for joint and unified synthesis of speech audio and co …

Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech

M Jeong, M Kim, BJ Choi, J Yoon… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
Though neural text-to-speech (TTS) models show remarkable performance, they still require
a large amount of paired dataset, which is expensive to collect. The heavy demand for …

Utts: Unsupervised tts with conditional disentangled sequential variational auto-encoder

J Lian, C Zhang, G Krishna Anumanchipalli… - arXiv e …, 2022 - ui.adsabs.harvard.edu
In this paper, we propose a novel unsupervised text-to-speech (UTTS) framework which
does not require text-audio pairs for the TTS acoustic modeling (AM). UTTS is a multi …