[HTML][HTML] Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

J Galić, B Marković, Đ Grozdić, B Popović, S Šajić - Applied Sciences, 2024 - mdpi.com
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize
normal speech. Due to a considerable acoustic mismatch between normal speech and …

Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning

S Hizlisoy, RS Arslan, E Çolakoğlu - … Journal on Audio, Speech, and Music …, 2024 - Springer
Analyzing songs is a problem that is being investigated to aid various operations on music
access platforms. At the beginning of these problems is the identification of the person who …

Robust Forest Sound Classification Using Pareto-Mordukhovich Optimized MFCC in Environmental Monitoring

A Qurthobi, R Damasevicius, V Barzdaitis… - IEEE …, 2025 - ieeexplore.ieee.org
As a complex ecosystem composed of flora and fauna, the forest has always been
vulnerable to threats. Previous researchers utilized environmental audio collections, such as …

[PDF][PDF] Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study.

J Galić, Đ GROZDIĆ - Advances in Electrical & Computer …, 2023 - researchgate.net
Automatic Speech Recognition (ASR) systems are notorious for their poor performance in
adverse conditions, leading to high sensitivity and low robustness. Due to the costly and …

Deep Learning for Arabic Speech Recognition Using Convolutional Neural Networks

S Ouali, S El Garouani - Journal of Electrical Systems, 2024 - search.proquest.com
Extracting the speaker's emotional state from their speech has become an active research
topic lately due to the demand for more human interactive applications. This field of research …

Speech recognition based on the transformer's multi-head attention in Arabic

O Mahmoudi, M Filali-Bouami, M Benchat - International Journal of Speech …, 2024 - Springer
The Transformer model is frequently employed for speech command recognition (SCR)
since it supports parallelization and has internal attention. The high learning speed of this …

EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks

H Hamza, F Gafoor, F Sithara, G Anil… - arXiv preprint arXiv …, 2023 - arxiv.org
In the era of advanced artificial intelligence and human-computer interaction, identifying
emotions in spoken language is paramount. This research explores the integration of deep …

Arabic Speech Emotion Recognition using Convolutional Neural Networks

S Ouali, S El Garouani - Journal of Electrical Systems, 2024 - search.proquest.com
Emotions are considered an essential and fundamental aspect of human conversations. It
serves as a means for opinion expression and for enlightening others about their …

[PDF][PDF] Towards an Efficient and Accurate Speech Enhancement by a Comprehensive Ablation Study

LA Azcutia - 2024 - oa.upm.es
Speech enhancement tasks are methods that improve the quality and intelligibility of noisy
audio signals. To that end, speech enhancement models are trained to distinguish between …

시간-주파수도메인변환및W2GAN-GP 모델기반의향상된오디오데이터증강

신도경, 김영대 - 융복합지식학회논문지, 2024 - dbpia.co.kr
최근 딥러닝 기술은 다양한 분야의 분류 시스템에 활용됨에 따라 점차 딥러닝 모델의 성능을
극대화하기 위한 연구가 활발하게 진행되고 있다. 딥러닝 모델의 성능은 학습 데이터의 양과 …