A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Bectra: Transducer-based end-to-end asr with bert-enhanced encoder

Y Higuchi, T Ogawa, T Kobayashi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech
recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder …

Fine-grained textual knowledge transfer to improve rnn transducers for speech recognition and understanding

V Sunder, S Thomas, HKJ Kuo… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
RNN Tranducer (RNN-T) technology is very popular for building deployable models for end-
to-end (E2E) automatic speech recognition (ASR) and spoken language understanding …

UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models

R Fan, NB Shankar, A Alwan - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org
Non-autoregressive automatic speech recognition (NASR) models have gained attention
due to their parallelism and fast inference. The encoder-based NASR, eg connectionist …

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

T Udagawa, M Suzuki, G Kurata… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Transferring the knowledge of large language models (LLMs) is a promising technique to
incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) …

[PDF][PDF] ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding

V Sunder, E Fosler-Lussier, S Thomas… - … Conference of the …, 2023 - vishalsunder.github.io
Dialog history enhances downstream classification performance in both speech and text
based dialog systems. However, there still exists a gap in dialog history integration in a fully …

Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder

Y Higuchi, A Rosenberg, Y Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Much of the recent progress in automatic speech recognition (ASR) lies in developing an
acoustic encoder, such as enlarging its capacity and designing a refined architecture for …

Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition

K Deng, PC Woodland - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art
recognition accuracy, it tends to be implicitly biased towards the training data distribution …

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

M Someki, N Eng, Y Higuchi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Attention-based encoder-decoder models with autoregressive (AR) decoding have proven
to be the dominant approach for automatic speech recognition (ASR) due to their superior …