Estimating phoneme class conditional probabilities from raw speech signal using convolutional...

J Gu, Z Wang, J Kuen, L Ma, A Shahroudy, B Shuai… - Pattern recognition, 2018 - Elsevier

In the last few years, deep learning has led to very good performance on a variety of
problems, such as visual recognition, speech recognition and natural language processing …

被引用次数：6995 相关文章所有 7 个版本

[PDF] arxiv.org

Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arXiv preprint arXiv …, 2020 - arxiv.org

Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

被引用次数：114 相关文章所有 3 个版本

[PDF] ieee.org

Robust speech emotion recognition using CNN+ LSTM based on stochastic fractal search optimization algorithm

AA Abdelhamid, ESM El-Kenawy, B Alotaibi… - Ieee …, 2022 - ieeexplore.ieee.org

One of the main challenges facing the current approaches of speech emotion recognition is
the lack of a dataset large enough to train the currently available deep learning models …

被引用次数：138 相关文章所有 7 个版本

[PDF] arxiv.org

Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org

Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

被引用次数：915 相关文章所有 7 个版本

[PDF] academia.edu

Speech emotion recognition using deep 1D & 2D CNN LSTM networks

J Zhao, X Mao, L Chen - Biomedical signal processing and control, 2019 - Elsevier

We aimed at learning deep emotion features to recognize speech emotion. Two
convolutional neural network and long short-term memory (CNN LSTM) networks, one 1D …

被引用次数：1127 相关文章所有 3 个版本

[PDF] academia.edu

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arXiv preprint arXiv …, 2016 - academia.edu

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

被引用次数：5912 相关文章所有 10 个版本

[PDF] ieee.org

Speech emotion recognition from 3D log-mel spectrograms with deep learning network

H Meng, T Yan, F Yuan, H Wei - IEEE access, 2019 - ieeexplore.ieee.org

Speech emotion recognition is a vital and challenging task that the feature extraction plays a
significant role in the SER performance. With the development of deep learning, we put our …

被引用次数：332 相关文章所有 4 个版本

[PDF] uni-augsburg.de

Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network

G Trigeorgis, F Ringeval, R Brueckner… - … on acoustics, speech …, 2016 - ieeexplore.ieee.org

The automatic recognition of spontaneous emotions from speech is a challenging task. On
the one hand, acoustic features need to be robust enough to capture the emotional content …

被引用次数：1129 相关文章所有 22 个版本

[PDF] arxiv.org

LEAF: A learnable frontend for audio classification

N Zeghidour, O Teboul, FDC Quitry… - arXiv preprint arXiv …, 2021 - arxiv.org

Mel-filterbanks are fixed, engineered audio features which emulate human perception and
have been used through the history of audio understanding up to today. However, their …

被引用次数：181 相关文章所有 3 个版本

[PDF] isca-archive.org

[PDF][PDF] Learning the speech front-end with raw waveform CLDNNs.

TN Sainath, RJ Weiss, AW Senior, KW Wilson… - Interspeech, 2015 - isca-archive.org

Learning an acoustic model directly from the raw waveform has been an active area of
research. However, waveformbased models have not yet matched the performance of …

被引用次数：628 相关文章所有 10 个版本

高级搜索

QQ 群