Delightfultts: The microsoft speech synthesis system for blizzard challenge 2021

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：135 相关文章所有 6 个版本

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

被引用次数：133 相关文章所有 3 个版本

[PDF] arxiv.org

Naturalspeech: End-to-end text-to-speech synthesis with human-level quality

X Tan, J Chen, H Liu, J Cong, C Zhang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Text-to-speech (TTS) has made rapid progress in both academia and industry in recent
years. Some questions naturally arise that whether a TTS system can achieve human-level …

被引用次数：153 相关文章所有 9 个版本

[PDF] arxiv.org

Adaspeech 4: Adaptive text to speech in zero-shot scenarios

Y Wu, X Tan, B Li, L He, S Zhao, R Song, T Qin… - arXiv preprint arXiv …, 2022 - arxiv.org

Adaptive text to speech (TTS) can synthesize new voices in zero-shot scenarios efficiently,
by using a well-trained source TTS model without adapting it on the speech data of new …

被引用次数：60 相关文章所有 6 个版本

[PDF] arxiv.org

Promptstyle: Controllable style transfer for text-to-speech with natural language descriptions

G Liu, Y Zhang, Y Lei, Y Chen, R Wang, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Style transfer TTS has shown impressive performance in recent years. However, style
control is often restricted to systems built on expressive speech recordings with discrete style …

被引用次数：31 相关文章所有 4 个版本

Generation and detection of manipulated multimodal audiovisual content: Advances, trends and open challenges

H Liz-Lopez, M Keita, A Taleb-Ahmed, A Hadid… - Information …, 2024 - Elsevier

Generative deep learning techniques have invaded the public discourse recently. Despite
the advantages, the applications to disinformation are concerning as the counter-measures …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Delightfultts 2: End-to-end speech synthesis with adversarial vector-quantized auto-encoders

Y Liu, R Xue, L He, X Tan, S Zhao - arXiv preprint arXiv:2207.04646, 2022 - arxiv.org

Current text to speech (TTS) systems usually leverage a cascaded acoustic model and
vocoder pipeline with mel-spectrograms as the intermediate representations, which suffer …

被引用次数：27 相关文章所有 5 个版本

[PDF] arxiv.org

Low-resource multilingual and zero-shot multispeaker TTS

F Lux, J Koch, NT Vu - arXiv preprint arXiv:2210.12223, 2022 - arxiv.org

While neural methods for text-to-speech (TTS) have shown great advances in modeling
multiple speakers, even in zero-shot settings, the amount of data needed for those …

被引用次数：24 相关文章所有 3 个版本

MSMC-TTS: Multi-stage multi-codebook VQ-VAE based neural TTS

H Guo, F Xie, X Wu, FK Soong… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org

This article aims to improve neural TTS with vector-quantized, compact speech
representations. We propose a Vector-Quantized Variational AutoEncoder (VQ-VAE) based …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

DiCLET-TTS: Diffusion model based cross-lingual emotion transfer for text-to-speech—A study between English and Mandarin

T Li, C Hu, J Cong, X Zhu, J Li, Q Tian… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

While the performance of cross-lingual TTS based on monolingual corpora has been
significantly improved recently, generating cross-lingual speech still suffers from the foreign …

被引用次数：8 相关文章所有 4 个版本

高级搜索

QQ 群