Retrieval-augmented generation for large language models: A survey

Y Gao, Y Xiong, X Gao, K Jia, J Pan, Y Bi, Y Dai… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) demonstrate powerful capabilities, but they still face
challenges in practical applications, such as hallucinations, slow knowledge updates, and …

Retrieving multimodal information for augmented generation: A survey

R Zhao, H Chen, W Wang, F Jiao, XL Do, C Qin… - arXiv preprint arXiv …, 2023 - arxiv.org
As Large Language Models (LLMs) become popular, there emerged an important trend of
using multimodality to augment the LLMs' generation ability, which enables LLMs to better …

[PDF][PDF] Dual-mode NAM: Effective top-k context injection for end-to-end asr

Z Wu, T Munkhdalai, P Rondon, G Pundak… - Proc …, 2023 - isca-archive.org
ASR systems in real applications must be adapted on the fly to correctly recognize task-
specific contextual terms, such as contacts, application names and media entities. However …

Multi-modal retrieval for large language model based speech recognition

A Gourav, J Kolehmainen, P Shivakumar… - Findings of the …, 2024 - aclanthology.org
Retrieval is a widely adopted approach for improving language models leveraging external
information. As the field moves towards multi-modal large language models, it is important to …

Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models

B Yusuf, MK Baskar, A Rosenberg… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper explores speculative speech recognition (SSR), where we empower
conventional automatic speech recognition (ASR) with speculation capabilities, allowing the …

knn-ctc: Enhancing asr via retrieval of ctc pseudo labels

J Zhou, S Zhao, Y Liu, W Zeng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
The success of retrieval-augmented language models in various natural language
processing (NLP) tasks has been constrained in automatic speech recognition (ASR) …

Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries

A Mittal, S Sarawagi, P Jyothi, G Saon… - Proceedings of the …, 2023 - aclanthology.org
Despite the impressive performance of ASR models on mainstream benchmarks, their
performance on rare words is unsatisfactory. In enterprise settings, often a focused list of …

Retrieval Augmented End-to-End Spoken Dialog Models

M Wang, I Shafran, H Soltau, W Han… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We recently developed a joint speech and language model (SLM [1]) which fuses a
pretrained foundational speech model and a large language model (LLM), while preserving …

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings

C Li, G Wang, K Kastner, H Su, A Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Automatic speech recognition (ASR) systems can suffer from poor recall for various reasons,
such as noisy audio, lack of sufficient training data, etc. Previous work has shown that recall …

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

J Zhou, S Zhao, H Wang, TH Zhang, H Sun… - arXiv preprint arXiv …, 2024 - arxiv.org
The kNN-CTC model has proven to be effective for monolingual automatic speech
recognition (ASR). However, its direct application to multilingual scenarios like code …