Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

The decades progress on code-switching research in nlp: A systematic survey on trends and challenges

GI Winata, AF Aji, ZX Yong, T Solorio - arXiv preprint arXiv:2212.09660, 2022 - arxiv.org
Code-Switching, a common phenomenon in written text and conversation, has been studied
over decades by the natural language processing (NLP) research community. Initially, code …

SANE-TTS: stable and natural end-to-end multilingual text-to-speech

H Cho, W Jung, J Lee, SH Woo - arXiv preprint arXiv:2206.12132, 2022 - arxiv.org
In this paper, we present SANE-TTS, a stable and natural end-to-end multilingual TTS
model. By the difficulty of obtaining multilingual corpus for given speaker, training …

A systematic review and analysis of multilingual data strategies in text-to-speech for low-resource languages

P Do, M Coler, J Dijkstra, E Klabbers - Interspeech 2021, 2021 - research.rug.nl
We provide a systematic review of past studies that use multilingual data for text-to-speech
(TTS) of low-resource languages (LRLs). We focus on the strategies used by these studies …

Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation

M Kim, J Choi, D Kim, YM Ro - arXiv preprint arXiv:2308.01831, 2023 - arxiv.org
In this paper, we propose a method to learn unified representations of multilingual speech
and text with a single model, especially focusing on the purpose of speech synthesis. We …

Zmm-tts: Zero-shot multilingual and multispeaker speech synthesis conditioned on self-supervised discrete speech representations

C Gong, X Wang, E Cooper, D Wells… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Neural text-to-speech (TTS) has achieved humanlike synthetic speech for single-speaker,
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …

Deep speech synthesis from articulatory representations

P Wu, S Watanabe, L Goldstein, AW Black… - arXiv preprint arXiv …, 2022 - arxiv.org
In the articulatory synthesis task, speech is synthesized from input features containing
information about the physical behavior of the human vocal tract. This task provides a …