You can't pick your neighbors, or can you? When and how to rely on retrieval in the $ k $ NN-LM

When not to trust language models: Investigating effectiveness of parametric and non-parametric memories

A Mallen, A Asai, V Zhong, R Das, D Khashabi… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite their impressive performance on diverse tasks, large language models (LMs) still
struggle with tasks requiring rich world knowledge, implying the limitations of relying solely …

被引用次数：384 相关文章所有 6 个版本

[PDF] neurips.cc

Unlimiformer: Long-range transformers with unlimited length input

A Bertsch, U Alon, G Neubig… - Advances in Neural …, 2024 - proceedings.neurips.cc

Since the proposal of transformers, these models have been limited to bounded input
lengths, because of their need to attend to every token in the input. In this work, we propose …

被引用次数：120 相关文章所有 7 个版本

[PDF] mlr.press

Why do nearest neighbor language models work?

FF Xu, U Alon, G Neubig - International Conference on …, 2023 - proceedings.mlr.press

Abstract Language models (LMs) compute the probability of a text by sequentially computing
a representation of an already-seen context and using this representation to predict the next …

被引用次数：26 相关文章所有 6 个版本

[PDF] arxiv.org

Goodtriever: Adaptive toxicity mitigation with retrieval-augmented models

L Pozzobon, B Ermis, P Lewis, S Hooker - arXiv preprint arXiv:2310.07589, 2023 - arxiv.org

Considerable effort has been dedicated to mitigating toxicity, but existing methods often
require drastic modifications to model parameters or the use of computationally intensive …

被引用次数：17 相关文章所有 4 个版本

[PDF] acm.org

FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion

Q Guo, X Li, X Xie, S Liu, Z Tang, R Feng… - Proceedings of the 33rd …, 2024 - dl.acm.org

The rise of code pre-trained models has significantly enhanced various coding tasks, such
as code completion, and tools like GitHub Copilot. However, the substantial size of these …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

NN-Adapter: Efficient Domain Adaptation for Black-Box Language Models

Y Huang, D Liu, Z Zhong, W Shi, YT Lee - arXiv preprint arXiv:2302.10879, 2023 - arxiv.org

Fine-tuning a language model on a new domain is standard practice for domain adaptation.
However, it can be infeasible when it comes to modern large-scale language models such …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Knn-lm does not improve open-ended text generation

S Wang, Y Song, A Drozdov, A Garimella… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we study the generation quality of interpolation-based retrieval-augmented
language models (LMs). These methods, best exemplified by the KNN-LM, interpolate the …

被引用次数：13 相关文章所有 4 个版本

[PDF] aclanthology.org

PersonaLM: Language Model Personalization via Domain-distributed Span Aggregated K-Nearest N-gram Retrieval Augmentation

P Mathur, Z Liu, K Li, Y Ma, G Keren… - Findings of the …, 2023 - aclanthology.org

Abstract We introduce PersonaLM-Domain-distributed Span-Aggregated K-nearest N-gram
retrieval augmentation to improve language modeling for Automatic Speech Recognition …

被引用次数：1 相关文章所有 3 个版本

[PDF] archive.org

SIGIR 2023 Workshop on Retrieval Enhanced Machine Learning (REML@ SIGIR 2023)

M Bendersky, D Chen, F Diaz, H Zamani - Proceedings of the 46th …, 2023 - dl.acm.org

Most machine learning models are designed to be self-contained and encode both"
knowledge" and" reasoning" in their parameters. However, such models cannot perform …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

On the generalization ability of retrieval-enhanced transformers

T Norlund, E Doostmohammadi, R Johansson… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent work on the Retrieval-Enhanced Transformer (RETRO) model has shown that off-
loading memory from trainable weights to a retrieval database can significantly improve …

被引用次数：7 相关文章所有 4 个版本

高级搜索

QQ 群