Deep representation learning in speech processing: Challenges, recent advances, and future trends

Discussion of Features for Acoustic Anomaly Detection under Industrial Disturbing Noise in an End-of-Line Test of Geared Motors

P Wißbrock, D Pelkmann… - 2023 IEEE 21st …, 2023 - ieeexplore.ieee.org

In the end-of-line test of geared motors, the evaluation of product quality is important. Due to
time constraints and the high diversity of variants, acoustic measurements are more …

被引用次数：1 相关文章所有 3 个版本

Towards Optimal Voice Disentanglement with Weak Supervision

MR Izadi, Y Yan, S Zhang… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Voice disentanglement, the process of isolating speech or singing voice into several latent
subspaces, each representing certain aspects, holds significant importance in diverse audio …

[PDF] arxiv.org

End-to-end recurrent denoising autoencoder embeddings for speaker identification

E Rituerto-González, C Peláez-Moreno - Neural Computing and …, 2021 - Springer

Abstract Speech 'in-the-wild'is a handicap for speaker recognition systems due to the
variability induced by real-life conditions, such as environmental noise and the emotional …

被引用次数：5 相关文章所有 9 个版本

[PDF] researchgate.net

[PDF][PDF] Emotion modelling for speech generation

K Zhou - Phd thesis, 2022 - researchgate.net

Seq2seq speech synthesis frameworks, such as Tacotron [83], can generate high-quality
synthetic speech. However, such frameworks heavily rely on a large amount of training data …

被引用次数：2 相关文章

[PDF] arxiv.org

Raw Data Is All You Need: Virtual Axle Detector with Enhanced Receptive Field

H Riedel, RS Lorenzen, C Hübler - arXiv preprint arXiv:2309.01574, 2023 - arxiv.org

Rising maintenance costs of ageing infrastructure necessitate innovative monitoring
techniques. This paper presents a new approach for axle detection, enabling real-time …

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition

T Rajapakshe, R Rana, S Khalifa… - arXiv preprint arXiv …, 2022 - arxiv.org

Computers can understand and then engage with people in an emotionally intelligent way
thanks to speech-emotion recognition (SER). However, the performance of SER in cross …

Korean Prosody Phrase Boundary Prediction Model for Speech Synthesis Service in Smart Healthcare

M Kim, Y Jung, HC Kwon - Electronics, 2021 - mdpi.com

Speech processing technology has great potential in the medical field to provide beneficial
solutions for both patients and doctors. Speech interfaces, represented by speech synthesis …

被引用次数：1 相关文章所有 3 个版本

[PDF] mdpi.com

Speech recognition for task domains with sparse matched training data

BO Kang, HB Jeon, JG Park - Applied Sciences, 2020 - mdpi.com

We propose two approaches to handle speech recognition for task domains with sparse
matched training data. One is an active learning method that selects training data for the …

被引用次数：3 相关文章所有 5 个版本

[PDF] researchgate.net

Heuristics-Based Hyperparameter Tuning for Transfer Learning Algorithms

UP Singh, KP Singh, M Ojha - Advanced Machine Learning with …, 2024 - Springer

Hyperparameters play a crucial role in controlling the learning process, consequently
impacting the model performance significantly. In most machine and deep learning …

Uncertainty and the Medical Interview: Towards Self-Assessment in Machine Learning Models

JD Havtorn - 2024 - orbit.dtu.dk

Natural language plays a key role in healthcare systems worldwide; yet, the medical
interview process has seen little development compared to the strides made in medical …

高级搜索

QQ 群