Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

N Mohammadi Foumani, L Miller, CW Tan… - ACM Computing …, 2024 - dl.acm.org

Time Series Classification and Extrinsic Regression are important and challenging machine
learning tasks. Deep learning has revolutionized natural language processing and computer …

被引用次数：81 相关文章所有 5 个版本

[PDF] arxiv.org

Recent progress in the CUHK dysarthric speech recognition system

S Liu, M Geng, S Hu, X Xie, M Cui, J Yu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past
few decades, recognition of disordered speech remains a highly challenging task to date …

被引用次数：75 相关文章所有 8 个版本

[PDF] plos.org

An empirical survey of data augmentation for time series classification with neural networks

BK Iwana, S Uchida - Plos one, 2021 - journals.plos.org

In recent times, deep artificial neural networks have achieved many successes in pattern
recognition. Part of this success can be attributed to the reliance on big data to increase …

被引用次数：650 相关文章所有 13 个版本

[PDF] arxiv.org

Cascade versus direct speech translation: Do the differences still make a difference?

L Bentivogli, M Cettolo, M Gaido, A Karakanta… - arXiv preprint arXiv …, 2021 - arxiv.org

Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …

被引用次数：79 相关文章所有 11 个版本

[PDF] mdpi.com

A new approach for detecting fundus lesions using image processing and deep neural network architecture based on YOLO model

C Santos, M Aguiar, D Welfer, B Belloni - Sensors, 2022 - mdpi.com

Diabetic Retinopathy is one of the main causes of vision loss, and in its initial stages, it
presents with fundus lesions, such as microaneurysms, hard exudates, hemorrhages, and …

被引用次数：41 相关文章所有 9 个版本

[PDF] arxiv.org

Single headed attention based sequence-to-sequence model for state-of-the-art results on switchboard

Z Tüske, G Saon, K Audhkhasi, B Kingsbury - arXiv preprint arXiv …, 2020 - arxiv.org

It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition
models are competitive with hybrid models only when a large amount of data, at least a …

被引用次数：82 相关文章所有 8 个版本

[PDF] arxiv.org

Adaptive multilingual speech recognition with pretrained models

NQ Pham, A Waibel, J Niehues - arXiv preprint arXiv:2205.12304, 2022 - arxiv.org

Multilingual speech recognition with supervised learning has achieved great results as
reflected in recent research. With the development of pretraining methods on audio and text …

被引用次数：29 相关文章所有 3 个版本

[PDF] acm.org

Fake speech detection using residual network with transformer encoder

Z Zhang, X Yi, X Zhao - Proceedings of the 2021 ACM workshop on …, 2021 - dl.acm.org

Fake speech detection aims to distinguish fake speech from natural speech. This paper
presents an effective fake speech detection scheme based on residual network with …

被引用次数：52 相关文章

[PDF] arxiv.org

A new training pipeline for an improved neural transducer

A Zeyer, A Merboldt, R Schlüter, H Ney - arXiv preprint arXiv:2005.09319, 2020 - arxiv.org

The RNN transducer is a promising end-to-end model candidate. We compare the original
training criterion with the full marginalization over all alignments, to the commonly used …

被引用次数：57 相关文章所有 8 个版本

[PDF] arxiv.org

Under the morphosyntactic lens: A multifaceted evaluation of gender bias in speech translation

B Savoldi, M Gaido, L Bentivogli, M Negri… - arXiv preprint arXiv …, 2022 - arxiv.org

Gender bias is largely recognized as a problematic phenomenon affecting language
technologies, with recent studies underscoring that it might surface differently across …

被引用次数：28 相关文章所有 6 个版本

高级搜索

QQ 群