A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Galactica: A large language model for science

R Taylor, M Kardas, G Cucurull, T Scialom… - arXiv preprint arXiv …, 2022 - arxiv.org
Information overload is a major obstacle to scientific progress. The explosive growth in
scientific literature and data has made it ever harder to discover useful insights in a large …

Taxonomy of risks posed by language models

L Weidinger, J Uesato, M Rauh, C Griffin… - Proceedings of the …, 2022 - dl.acm.org
Responsible innovation on large-scale Language Models (LMs) requires foresight into and
in-depth understanding of the risks these models may pose. This paper develops a …

Palm: Scaling language modeling with pathways

A Chowdhery, S Narang, J Devlin, M Bosma… - Journal of Machine …, 2023 - jmlr.org
Large language models have been shown to achieve remarkable performance across a
variety of natural language tasks using few-shot learning, which drastically reduces the …

Glm-130b: An open bilingual pre-trained model

A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding… - arXiv preprint arXiv …, 2022 - arxiv.org
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model
with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as …

Using large language models to simulate multiple humans and replicate human subject studies

GV Aher, RI Arriaga, AT Kalai - International Conference on …, 2023 - proceedings.mlr.press
We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what
extent a given language model, such as GPT models, can simulate different aspects of …

Understanding the benefits and challenges of deploying conversational AI leveraging large language models for public health intervention

E Jo, DA Epstein, H Jung, YH Kim - … of the 2023 CHI Conference on …, 2023 - dl.acm.org
Recent large language models (LLMs) have advanced the quality of open-ended
conversations with chatbots. Although LLM-driven chatbots have the potential to support …

Towards measuring the representation of subjective global opinions in language models

E Durmus, K Nguyen, TI Liao, N Schiefer… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) may not equitably represent diverse global perspectives on
societal issues. In this paper, we develop a quantitative framework to evaluate whose …

Safe rlhf: Safe reinforcement learning from human feedback

J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
With the development of large language models (LLMs), striking a balance between the
performance and safety of AI systems has never been more critical. However, the inherent …