Robutrans: A robust transformer-based text-to-speech model

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：150 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：410 相关文章所有 2 个版本

[PDF] arxiv.org

Viola: Unified codec language models for speech recognition, synthesis, and translation

T Wang, L Zhou, Z Zhang, Y Wu, S Liu, Y Gaur… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent research shows a big convergence in model architecture, training objectives, and
inference methods across various tasks for different modalities. In this paper, we propose …

被引用次数：75 相关文章所有 2 个版本

[PDF] arxiv.org

Review of end-to-end speech synthesis technology based on deep learning

Z Mu, X Yang, Y Dong - arXiv preprint arXiv:2104.09995, 2021 - arxiv.org

As an indispensable part of modern human-computer interaction system, speech synthesis
technology helps users get the output of intelligent machine more easily and intuitively, thus …

被引用次数：37 相关文章所有 2 个版本

[PDF] arxiv.org

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org

The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

被引用次数：52 相关文章所有 4 个版本

[PDF] arxiv.org

Delightfultts: The microsoft speech synthesis system for blizzard challenge 2021

Y Liu, Z Xu, G Wang, K Chen, B Li, X Tan, J Li… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper describes the Microsoft end-to-end neural text to speech (TTS) system:
DelightfulTTS for Blizzard Challenge 2021. The goal of this challenge is to synthesize …

被引用次数：62 相关文章所有 4 个版本

[PDF] aaai.org

Multi-spectrogan: High-diversity and high-fidelity spectrogram generation with adversarial style combination for speech synthesis

SH Lee, HW Yoon, HR Noh, JH Kim… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

While generative adversarial networks (GANs) based neural text-to-speech (TTS) systems
have shown significant improvement in neural speech synthesis, there is no TTS system to …

被引用次数：57 相关文章所有 5 个版本

VIOLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation

T Wang, L Zhou, Z Zhang, Y Wu, S Liu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent research shows a big convergence in model architecture, training objectives, and
inference methods across various tasks for different modalities. In this paper, we propose …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Foundationtts: Text-to-speech for asr customization with generative language model

R Xue, Y Liu, L He, X Tan, L Liu, E Lin… - arXiv preprint arXiv …, 2023 - arxiv.org

Neural text-to-speech (TTS) generally consists of cascaded architecture with separately
optimized acoustic model and vocoder, or end-to-end architecture with continuous mel …

被引用次数：9 相关文章所有 2 个版本

Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a
variant of the standard version (L1), which is challenging as L2 is different from L1 in terms …

被引用次数：6 相关文章所有 2 个版本

高级搜索

QQ 群