An overview of affective speech synthesis and conversion in the deep learning era

Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

H Barakat, O Turk, C Demiroglu - EURASIP Journal on Audio, Speech, and …, 2024 - Springer

Speech synthesis has made significant strides thanks to the transition from machine learning
to deep learning models. Contemporary text-to-speech (TTS) models possess the capability …

被引用次数：1 相关文章所有 6 个版本

[PDF] science.org

[PDF][PDF] Beyond Deep Learning: Charting the Next Frontiers of Affective Computing

A Triantafyllopoulos, L Christ, A Gebhard… - Intelligent …, 2024 - spj.science.org

Affective computing (AC), as most other areas of computational research, has benefited
tremendously from advances in deep learning (DL). These advances have opened up new …

[PDF] arxiv.org

BASE TTS: Lessons from building a billion-parameter text-to-speech model on 100K hours of data

M Łajszczak, G Cámbara, Y Li, F Beyhan… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf {B} $
ig $\textbf {A} $ daptive $\textbf {S} $ treamable TTS with $\textbf {E} $ mergent abilities …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Effect of attention and self-supervised speech embeddings on non-semantic speech tasks

P Mohapatra, A Pandey, Y Sui, Q Zhu - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Human emotion understanding is pivotal in making conversational technology mainstream.
We view speech emotion understanding as a perception task which is a more realistic …

被引用次数：3 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Turn Left Turn Right-Delving type and modality of instructions in navigation assistant systems for people with visual impairments

B Kuriakose, IM Ness, MÅ skov Tengstedt… - International Journal of …, 2023 - Elsevier

Receiving navigation directions and relevant information through appropriate channels is
crucial for individuals with visual impairments when they use navigation assistant systems …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

EMOCONV-Diff: Diffusion-Based Speech Emotion Conversion for Non-Parallel and in-the-Wild Data

NR Prabhu, B Lay, S Welker… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Speech emotion conversion is the task of converting the expressed emotion of a spoken
utterance to a target emotion while preserving the lexical content and speaker identity. While …

被引用次数：2 相关文章所有 4 个版本

Mdrt: Multi-domain synthetic speech localization

AKS Yadav, K Bhagtani, S Baireddy… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

With recent advancements in generating synthetic speech, tools to generate high-quality
synthetic speech impersonating any human speaker are easily available. Several incidents …

被引用次数：1 相关文章

[PDF] arxiv.org

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis

S Inoue, K Zhou, S Wang, H Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS)
synthesis. Prior studies have primarily focused on learning a global prosodic representation …

被引用次数：1 相关文章所有 3 个版本

[PDF] acm.org

Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators

W Hutiri, O Papakyriakopoulos, A Xiang - The 2024 ACM Conference on …, 2024 - dl.acm.org

The rapid and wide-scale adoption of AI to generate human speech poses a range of
significant ethical and safety risks to society that need to be addressed. For example, a …

被引用次数：1 相关文章所有 4 个版本

Improved Dendritic Learning: Activation Function Analysis

Y Wang, Y Yu, T Zhang, K Song, Y Wang, S Gao - Information Sciences, 2024 - Elsevier

This study conducted a thorough evaluation of an improved dendritic learning (DL)
framework, focusing specifically on its application in power load forecasting. The objective …

高级搜索

QQ 群