Daft-Exprt: Cross-speaker prosody transfer on any text for expressive speech synthesis

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

被引用次数：210 相关文章所有 3 个版本

[PDF] arxiv.org

Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org

Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

被引用次数：100 相关文章

[PDF] aaai.org

Avocodo: Generative adversarial network for artifact-free vocoder

T Bak, J Lee, H Bae, J Yang, JS Bae… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Neural vocoders based on the generative adversarial neural network (GAN) have been
widely used due to their fast inference speed and lightweight networks while generating …

被引用次数：38 相关文章所有 4 个版本

[PDF] springer.com

Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

H Barakat, O Turk, C Demiroglu - EURASIP Journal on Audio, Speech, and …, 2024 - Springer

Speech synthesis has made significant strides thanks to the transition from machine learning
to deep learning models. Contemporary text-to-speech (TTS) models possess the capability …

被引用次数：11 相关文章所有 6 个版本

Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a
variant of the standard version (L1), which is challenging as L2 is different from L1 in terms …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Mscenespeech: A multi-scene speech dataset for expressive speech synthesis

Q Yang, J Zuo, Z Su, Z Jiang, M Li, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce an open source high-quality Mandarin TTS dataset MSceneSpeech (Multiple
Scene Speech Dataset), which is intended to provide resources for expressive speech …

被引用次数：1 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] Using a large language model to control speaking style for expressive tts

AT Sigurgeirsson, S King - Dialogue, 2023 - isca-archive.org

Large generative language models have been used to solve various language-related
tasks. We explore whether such models can suggest appropriate prosody for expressive …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control

S Zhang, A Mehrish, Y Li, S Poria - arXiv preprint arXiv:2501.06276, 2025 - arxiv.org

Speech synthesis has significantly advanced from statistical methods to deep neural
network architectures, leading to various text-to-speech (TTS) models that closely mimic …

Controllable Speaking Styles Using A Large Language Model

A Sigurgeirsson, S King - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Reference-based Text-to-Speech (TTS) models can generate multiple, prosodically-different
renditions of the same target text. Such models jointly learn a latent acoustic space during …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech

T Bak, Y Eom, SJ Choi, YS Joo - arXiv preprint arXiv:2410.03192, 2024 - arxiv.org

Text-to-speech (TTS) systems that scale up the amount of training data have achieved
significant improvements in zero-shot speech synthesis. However, these systems have …

高级搜索

QQ 群