Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose …
FF Xu, U Alon, G Neubig - International Conference on …, 2023 - proceedings.mlr.press
Abstract Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next …
Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive …
The rise of code pre-trained models has significantly enhanced various coding tasks, such as code completion, and tools like GitHub Copilot. However, the substantial size of these …
Fine-tuning a language model on a new domain is standard practice for domain adaptation. However, it can be infeasible when it comes to modern large-scale language models such …
In this paper, we study the generation quality of interpolation-based retrieval-augmented language models (LMs). These methods, best exemplified by the KNN-LM, interpolate the …
Abstract We introduce PersonaLM-Domain-distributed Span-Aggregated K-nearest N-gram retrieval augmentation to improve language modeling for Automatic Speech Recognition …
Most machine learning models are designed to be self-contained and encode both" knowledge" and" reasoning" in their parameters. However, such models cannot perform …
Recent work on the Retrieval-Enhanced Transformer (RETRO) model has shown that off- loading memory from trainable weights to a retrieval database can significantly improve …