Recent advances in convolutional neural networks

J Gu, Z Wang, J Kuen, L Ma, A Shahroudy, B Shuai… - Pattern recognition, 2018 - Elsevier
In the last few years, deep learning has led to very good performance on a variety of
problems, such as visual recognition, speech recognition and natural language processing …

Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arXiv preprint arXiv …, 2020 - arxiv.org
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

Robust speech emotion recognition using CNN+ LSTM based on stochastic fractal search optimization algorithm

AA Abdelhamid, ESM El-Kenawy, B Alotaibi… - Ieee …, 2022 - ieeexplore.ieee.org
One of the main challenges facing the current approaches of speech emotion recognition is
the lack of a dataset large enough to train the currently available deep learning models …

Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

Speech emotion recognition using deep 1D & 2D CNN LSTM networks

J Zhao, X Mao, L Chen - Biomedical signal processing and control, 2019 - Elsevier
We aimed at learning deep emotion features to recognize speech emotion. Two
convolutional neural network and long short-term memory (CNN LSTM) networks, one 1D …

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arXiv preprint arXiv …, 2016 - academia.edu
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

Speech emotion recognition from 3D log-mel spectrograms with deep learning network

H Meng, T Yan, F Yuan, H Wei - IEEE access, 2019 - ieeexplore.ieee.org
Speech emotion recognition is a vital and challenging task that the feature extraction plays a
significant role in the SER performance. With the development of deep learning, we put our …

Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network

G Trigeorgis, F Ringeval, R Brueckner… - … on acoustics, speech …, 2016 - ieeexplore.ieee.org
The automatic recognition of spontaneous emotions from speech is a challenging task. On
the one hand, acoustic features need to be robust enough to capture the emotional content …

LEAF: A learnable frontend for audio classification

N Zeghidour, O Teboul, FDC Quitry… - arXiv preprint arXiv …, 2021 - arxiv.org
Mel-filterbanks are fixed, engineered audio features which emulate human perception and
have been used through the history of audio understanding up to today. However, their …

[PDF][PDF] Learning the speech front-end with raw waveform CLDNNs.

TN Sainath, RJ Weiss, AW Senior, KW Wilson… - Interspeech, 2015 - isca-archive.org
Learning an acoustic model directly from the raw waveform has been an active area of
research. However, waveformbased models have not yet matched the performance of …