A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Recent advances in deep learning based dialogue systems: A systematic survey

J Ni, T Young, V Pandelea, F Xue… - Artificial intelligence review, 2023 - Springer
Dialogue systems are a popular natural language processing (NLP) task as it is promising in
real-life applications. It is also a complicated task since many NLP tasks deserving study are …

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Diffusion-lm improves controllable text generation

X Li, J Thickstun, I Gulrajani… - Advances in Neural …, 2022 - proceedings.neurips.cc
Controlling the behavior of language models (LMs) without re-training is a major open
problem in natural language generation. While recent works have demonstrated successes …

Palm: Scaling language modeling with pathways

A Chowdhery, S Narang, J Devlin, M Bosma… - Journal of Machine …, 2023 - jmlr.org
Large language models have been shown to achieve remarkable performance across a
variety of natural language tasks using few-shot learning, which drastically reduces the …

Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks

Y Wang, S Mishra, P Alipoormolabashi, Y Kordi… - arXiv preprint arXiv …, 2022 - arxiv.org
How well can NLP models generalize to a variety of unseen tasks when provided with task
instructions? To address this question, we first introduce Super-NaturalInstructions, a …

Lora: Low-rank adaptation of large language models

EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li… - arXiv preprint arXiv …, 2021 - arxiv.org
An important paradigm of natural language processing consists of large-scale pre-training
on general domain data and adaptation to particular tasks or domains. As we pre-train larger …

Vera: Vector-based random matrix adaptation

DJ Kopiczko, T Blankevoort, YM Asano - arXiv preprint arXiv:2310.11454, 2023 - arxiv.org
Low-rank adapation (LoRA) is a popular method that reduces the number of trainable
parameters when finetuning large language models, but still faces acute storage challenges …

Prefix-tuning: Optimizing continuous prompts for generation

XL Li, P Liang - arXiv preprint arXiv:2101.00190, 2021 - arxiv.org
Fine-tuning is the de facto way to leverage large pretrained language models to perform
downstream tasks. However, it modifies all the language model parameters and therefore …

Large language models can be strong differentially private learners

X Li, F Tramer, P Liang, T Hashimoto - arXiv preprint arXiv:2110.05679, 2021 - arxiv.org
Differentially Private (DP) learning has seen limited success for building large deep learning
models of text, and straightforward attempts at applying Differentially Private Stochastic …