Learning filterbanks from raw speech for phone recognition

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arXiv preprint arXiv …, 2020 - arxiv.org

Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …

被引用次数：112 相关文章所有 3 个版本

[PDF] arxiv.org

Audio deepfake detection: A survey

J Yi, C Wang, J Tao, X Zhang, CY Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Audio deepfake detection is an emerging active topic. A growing number of literatures have
aimed to study deepfake detection algorithms and achieved effective performance, the …

被引用次数：57 相关文章所有 4 个版本

[PDF] neurips.cc

wav2vec 2.0: A framework for self-supervised learning of speech representations

A Baevski, Y Zhou, A Mohamed… - Advances in neural …, 2020 - proceedings.neurips.cc

We show for the first time that learning powerful representations from speech audio alone
followed by fine-tuning on transcribed speech can outperform the best semi-supervised …

被引用次数：6031 相关文章所有 11 个版本

[PDF] arxiv.org

Tera: Self-supervised learning of transformer encoder representation for speech

AT Liu, SW Li, H Lee - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

We introduce a self-supervised speech pre-training method called TERA, which stands for
Transformer Encoder Representations from Alteration. Recent approaches often learn by …

被引用次数：407 相关文章所有 6 个版本

[PDF] arxiv.org

vq-wav2vec: Self-supervised learning of discrete speech representations

A Baevski, S Schneider, M Auli - arXiv preprint arXiv:1910.05453, 2019 - arxiv.org

We propose vq-wav2vec to learn discrete representations of audio segments through a
wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel …

被引用次数：759 相关文章所有 5 个版本

[PDF] arxiv.org

wav2vec: Unsupervised pre-training for speech recognition

S Schneider, A Baevski, R Collobert, M Auli - arXiv preprint arXiv …, 2019 - arxiv.org

We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …

被引用次数：1680 相关文章所有 12 个版本

[PDF] arxiv.org

Deep audio-visual speech recognition

T Afouras, JS Chung, A Senior… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …

被引用次数：941 相关文章所有 15 个版本

[PDF] researchgate.net

Speaker recognition from raw waveform with sincnet

M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …

被引用次数：991 相关文章所有 10 个版本

[PDF] arxiv.org

LEAF: A learnable frontend for audio classification

N Zeghidour, O Teboul, FDC Quitry… - arXiv preprint arXiv …, 2021 - arxiv.org

Mel-filterbanks are fixed, engineered audio features which emulate human perception and
have been used through the history of audio understanding up to today. However, their …

被引用次数：180 相关文章所有 3 个版本

[PDF] researchgate.net

Interpretable convolutional filters with sincnet

M Ravanelli, Y Bengio - arXiv preprint arXiv:1811.09725, 2018 - arxiv.org

Deep learning is currently playing a crucial role toward higher levels of artificial intelligence.
This paradigm allows neural networks to learn complex and abstract representations, that …

被引用次数：148 相关文章所有 5 个版本

高级搜索

QQ 群