Deep learning for time series classification and extrinsic regression: A current survey

N Mohammadi Foumani, L Miller, CW Tan… - ACM Computing …, 2024 - dl.acm.org
Time Series Classification and Extrinsic Regression are important and challenging machine
learning tasks. Deep learning has revolutionized natural language processing and computer …

Recent progress in the CUHK dysarthric speech recognition system

S Liu, M Geng, S Hu, X Xie, M Cui, J Yu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past
few decades, recognition of disordered speech remains a highly challenging task to date …

An empirical survey of data augmentation for time series classification with neural networks

BK Iwana, S Uchida - Plos one, 2021 - journals.plos.org
In recent times, deep artificial neural networks have achieved many successes in pattern
recognition. Part of this success can be attributed to the reliance on big data to increase …

Cascade versus direct speech translation: Do the differences still make a difference?

L Bentivogli, M Cettolo, M Gaido, A Karakanta… - arXiv preprint arXiv …, 2021 - arxiv.org
Five years after the first published proofs of concept, direct approaches to speech translation
(ST) are now competing with traditional cascade solutions. In light of this steady progress …

A new approach for detecting fundus lesions using image processing and deep neural network architecture based on YOLO model

C Santos, M Aguiar, D Welfer, B Belloni - Sensors, 2022 - mdpi.com
Diabetic Retinopathy is one of the main causes of vision loss, and in its initial stages, it
presents with fundus lesions, such as microaneurysms, hard exudates, hemorrhages, and …

Single headed attention based sequence-to-sequence model for state-of-the-art results on switchboard

Z Tüske, G Saon, K Audhkhasi, B Kingsbury - arXiv preprint arXiv …, 2020 - arxiv.org
It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition
models are competitive with hybrid models only when a large amount of data, at least a …

Adaptive multilingual speech recognition with pretrained models

NQ Pham, A Waibel, J Niehues - arXiv preprint arXiv:2205.12304, 2022 - arxiv.org
Multilingual speech recognition with supervised learning has achieved great results as
reflected in recent research. With the development of pretraining methods on audio and text …

Fake speech detection using residual network with transformer encoder

Z Zhang, X Yi, X Zhao - Proceedings of the 2021 ACM workshop on …, 2021 - dl.acm.org
Fake speech detection aims to distinguish fake speech from natural speech. This paper
presents an effective fake speech detection scheme based on residual network with …

A new training pipeline for an improved neural transducer

A Zeyer, A Merboldt, R Schlüter, H Ney - arXiv preprint arXiv:2005.09319, 2020 - arxiv.org
The RNN transducer is a promising end-to-end model candidate. We compare the original
training criterion with the full marginalization over all alignments, to the commonly used …

Under the morphosyntactic lens: A multifaceted evaluation of gender bias in speech translation

B Savoldi, M Gaido, L Bentivogli, M Negri… - arXiv preprint arXiv …, 2022 - arxiv.org
Gender bias is largely recognized as a problematic phenomenon affecting language
technologies, with recent studies underscoring that it might surface differently across …