This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio …
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder …
V Sunder, S Thomas, HKJ Kuo… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
RNN Tranducer (RNN-T) technology is very popular for building deployable models for end- to-end (E2E) automatic speech recognition (ASR) and spoken language understanding …
Non-autoregressive automatic speech recognition (NASR) models have gained attention due to their parallelism and fast inference. The encoder-based NASR, eg connectionist …
T Udagawa, M Suzuki, G Kurata… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) …
Dialog history enhances downstream classification performance in both speech and text based dialog systems. However, there still exists a gap in dialog history integration in a fully …
Y Higuchi, A Rosenberg, Y Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Much of the recent progress in automatic speech recognition (ASR) lies in developing an acoustic encoder, such as enlarging its capacity and designing a refined architecture for …
K Deng, PC Woodland - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art recognition accuracy, it tends to be implicitly biased towards the training data distribution …
M Someki, N Eng, Y Higuchi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Attention-based encoder-decoder models with autoregressive (AR) decoding have proven to be the dominant approach for automatic speech recognition (ASR) due to their superior …