Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion

AR Bargum, S Serafin, C Erkut - Frontiers in signal processing, 2024 - frontiersin.org
Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
are gaining increasing popularity. Although many of the works in the field of voice …

[PDF][PDF] Fine-tune Pre-Trained Models with Multi-Level Feature Fusion for Speaker Verification

S Peng, W Guo, H Wu, Z Li, J Zhang - Proc. Interspeech 2024, 2024 - isca-archive.org
In this paper, we consider speaker verification by fine-tuning the pre-trained model (PTM)
with multi-level features, including multi-layer features from PTM and hand-crafted features …

CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis

Y Li, R Zhu, W Li - arXiv preprint arXiv:2407.07046, 2024 - arxiv.org
Multimodal sentiment analysis is an active research area that combines multiple data
modalities, eg, text, image and audio, to analyze human emotions and benefits a variety of …

Improving Voice Style Conversion via Self-attention VAE with Feature Disentanglement

H Yuan, P Li, G Zhao, J Zhang - CCF Conference on Computer Supported …, 2023 - Springer
Voice conversion (VC) is a widely used technique in intelligent speech processing, that aims
to modify the speaker's information while preserving the underlying linguistic content …

Innovative Speaker-Adaptive Style Transfer VAE-WadaIN for Enhanced Voice Conversion in Intelligent Speech Processing

M Guo, J Wang, C Liu, L Xue - 2024 4th International …, 2024 - ieeexplore.ieee.org
Voice Conversion (VC) is vital in intelligent speech processing, aiming to alter speech timbre
while maintaining linguistic content. Existing methods often neglect speaking style, including …