A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds

F Alías, JC Socoró, X Sevillano - Applied Sciences, 2016 - mdpi.com
Endowing machines with sensing capabilities similar to those of humans is a prevalent
quest in engineering and computer science. In the pursuit of making computers sense their …

A survey of underwater acoustic data classification methods using deep learning for shoreline surveillance

LCF Domingos, PE Santos, PSM Skelton… - Sensors, 2022 - mdpi.com
This paper presents a comprehensive overview of current deep-learning methods for
automatic object classification of underwater sonar data for shoreline surveillance …

Multi-modal sensor based emotion recognition and emotional interface

O Kalinli-Akbacak - US Patent 9,031,293, 2015 - Google Patents
Features, including one or more acoustic features, visual features, linguistic features, and
physical features may be extracted from signals obtained by one or more sensors with a …

An analysis of convolutional neural networks for speech recognition

JT Huang, J Li, Y Gong - 2015 IEEE International Conference …, 2015 - ieeexplore.ieee.org
Despite the fact that several sites have reported the effectiveness of convolutional neural
networks (CNNs) on some tasks, there is no deep analysis regarding why CNNs perform …

Features for voice activity detection: a comparative analysis

S Graf, T Herbig, M Buck, G Schmidt - EURASIP Journal on Advances in …, 2015 - Springer
In many speech signal processing applications, voice activity detection (VAD) plays an
essential role for separating an audio stream into time intervals that contain speech activity …

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition

MR Schädler, BT Meyer, B Kollmeier - The Journal of the Acoustical …, 2012 - pubs.aip.org
In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a
feature extraction scheme is proposed that takes spectro-temporal modulation frequencies …

Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech

L He, M Lech, NC Maddage, NB Allen - Biomedical Signal Processing and …, 2011 - Elsevier
Two new approaches to the feature extraction process for automatic stress and emotion
classification in speech are proposed and examined. The first method uses the empirical …

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition

MR Schädler, B Kollmeier - The Journal of the Acoustical Society of …, 2015 - pubs.aip.org
To test if simultaneous spectral and temporal processing is required to extract robust
features for automatic speech recognition (ASR), the robust spectro-temporal two …

Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram

PK Ajmera, DV Jadhav, RS Holambe - Pattern Recognition, 2011 - Elsevier
This paper presents a new feature extraction technique for speaker recognition using Radon
transform (RT) and discrete cosine transform (DCT). The spectrogram is compact, efficient in …

Maintaining filter structure: A Gabor-based convolutional neural network for image analysis

S Molaei, MESA Abadi - Applied Soft Computing, 2020 - Elsevier
In image segmentation and classification tasks, utilizing filters based on the target object
improves performance and requires less training data. We use the Gabor filter as …