Zmm-tts: Zero-shot multilingual and multispeaker speech synthesis conditioned on self-supervised discrete speech representations

C Gong, X Wang, E Cooper, D Wells… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker,
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …

Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a
variant of the standard version (L1), which is challenging as L2 is different from L1 in terms …

Introduction to Audio Deepfake Generation: Academic Insights for Non-Experts

JE Choi, K Schäfer, S Zmudzinski - Proceedings of the 3rd ACM …, 2024 - dl.acm.org
With the advancement of artificial intelligence, the methods for generating audio deepfakes
have improved, but the technology behind it has become more complex. Despite this, non …

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

R Badlani, A Arora, S Ghosh, R Valle… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis
system. Our model builds upon disentanglement strategies proposed in RADMMM [1] and …

[PDF][PDF] Improving Multilingual Text-to-Speech with Mixture-of-Language-Experts and Accent Disentanglement

J Wu, T Chen, M Chen, W Hu, S Wang… - Proc. Interspeech …, 2024 - isca-archive.org
Code-switching and accent control is particularly valuable in multilingual text-to-speech
(TTS) systems as both of them contribute to improving the authenticity and …

Single-Model Attribution for Spoofed Speech via Vocoder Fingerprints in an Open-World Setting

M Pizarro, M Laszkiewicz, D Kolossa… - arXiv preprint arXiv …, 2024 - arxiv.org
As speech generation technology advances, so do the potential threats of misusing spoofed
speech signals. One way to address these threats is by attributing the signals to their source …

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

C Gong, E Cooper, X Wang, C Qiang, M Geng… - arXiv preprint arXiv …, 2024 - arxiv.org
Self-supervised learning (SSL) representations from massively multilingual models offer a
promising solution for low-resource language speech tasks. Despite advancements …

Scaling NVIDIA's multi-speaker multi-lingual TTS systems with voice cloning to Indic Languages

A Arora, R Badlani, S Kim, R Valle… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-
speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we …

Scaling Nvidia's Multi-Speaker Multi-Lingual TTS Systems With Zero-Shot TTS to Indic Languages

A Arora, R Badlani, S Kim, R Valle… - … on Acoustics, Speech …, 2024 - ieeexplore.ieee.org
In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-
speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we …

Advancing Deep-Generated Speech and Defending against Its Misuse

Z Cai - 2023 - search.proquest.com
Deep learning has revolutionized speech generation, spanning synthesis areas such as text-
to-speech and voice conversion, leading to diverse advancements. On the one hand, when …