An embedded end-to-end voice assistant

L Lazzaroni, F Bellotti, R Berta - Engineering Applications of Artificial …, 2024 - Elsevier
Voice assistants are spreading in various environments, such as houses and cars, bringing
the possibility of controlling heterogeneous Internet of Things devices with simple voice …

Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a
variant of the standard version (L1), which is challenging as L2 is different from L1 in terms …

Improving mispronunciation detection using speech reconstruction

A Das, R Gutierrez-Osuna - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Training related machine learning tasks simultaneously can lead to improved performance
on both tasks. Text-to-speech (TTS) and mispronunciation detection and diagnosis (MDD) …

Towards zero-shot multi-speaker multi-accent text-to-speech synthesis

M Zhang, X Zhou, Z Wu, H Li - IEEE Signal Processing Letters, 2023 - ieeexplore.ieee.org
This letter presents a framework towards multi-accent neural text-to-speech synthesis for
zero-shot multi-speaker, which employs an encoder-decoder architecture and an accent …

Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis

Y Li, X Zhu, Y Lei, H Li, J Liu, D Xie… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Zero-shot emotion transfer in cross-lingual speech synthesis aims to transfer emotion from
an arbitrary speech reference in the source language to the synthetic speech in the target …

Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis

X Zhou, M Zhang, Y Zhou, Z Wu, H Li - arXiv preprint arXiv:2406.10844, 2024 - arxiv.org
Synthesizing speech across different accents while preserving the speaker identity is
essential for various real-world customer applications. However, the individual and accurate …

[PDF][PDF] Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings

L Gutscher, M Pucher, V Garcia - Proc. 2nd Annual Meeting of the …, 2023 - researchgate.net
For languages where extensive audio data and text transcriptions are available, text-to-
speech (TTS) systems have showcased the ability to generate speech that closely …

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

J Zhong, K Richmond, Z Su, S Sun - arXiv preprint arXiv:2409.09098, 2024 - arxiv.org
While recent Zero-Shot Text-to-Speech (ZS-TTS) models have achieved high naturalness
and speaker similarity, they fall short in accent fidelity and control. To address this issue, we …

Non-autoregressive real-time Accent Conversion model with voice cloning

V Nechaev, S Kosyakov - arXiv preprint arXiv:2405.13162, 2024 - arxiv.org
Currently, the development of Foreign Accent Conversion (FAC) models utilizes deep neural
network architectures, as well as ensembles of neural networks for speech recognition and …