Decoder-only architecture for streaming end-to-end speech recognition

E Tsunoo, H Futami, Y Kashiwagi, S Arora… - arXiv preprint arXiv …, 2024 - arxiv.org
Decoder-only language models (LMs) have been successfully adopted for speech-
processing tasks including automatic speech recognition (ASR). The LMs have ample …

Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing

W Chen, T Kano, A Ogawa, M Delcroix… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
The quadratic memory complexity of self-attention has generally restricted Transformer-
based models to utterance-based speech processing, preventing models from leveraging …

An End-to-End Speech Summarization Using Large Language Model

H Shang, Z Li, J Guo, S Li, Z Rao, Y Luo, D Wei… - arXiv preprint arXiv …, 2024 - arxiv.org
Abstractive Speech Summarization (SSum) aims to generate human-like text summaries
from spoken content. It encounters difficulties in handling long speech input and capturing …

[PDF][PDF] End-to-End Modeling for Abstractive Speech Summarization

R Sharma - 2024 - kilthub.cmu.edu
In our increasingly interconnected world, where speech remains the most intuitive and
natural form of communication, spoken language processing systems face a crucial …