Machine culture

L Brinkmann, F Baumann, JF Bonnefon… - Nature Human …, 2023 - nature.com
The ability of humans to create and disseminate culture is often credited as the single most
important factor of our success as a species. In this Perspective, we explore the notion of …

Pre-trained language models for text generation: A survey

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

A categorical archive of chatgpt failures

A Borji - arXiv preprint arXiv:2302.03494, 2023 - arxiv.org
Large language models have been demonstrated to be valuable in different fields. ChatGPT,
developed by OpenAI, has been trained using massive amounts of data and simulates …

Using large language models to simulate multiple humans and replicate human subject studies

GV Aher, RI Arriaga, AT Kalai - International Conference on …, 2023 - proceedings.mlr.press
We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what
extent a given language model, such as GPT models, can simulate different aspects of …

Towards automated circuit discovery for mechanistic interpretability

A Conmy, A Mavor-Parker, A Lynch… - Advances in …, 2023 - proceedings.neurips.cc
Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …

Mass-editing memory in a transformer

K Meng, AS Sharma, A Andonian, Y Belinkov… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent work has shown exciting promise in updating large language models with new
memories, so as to replace obsolete information or add specialized knowledge. However …

Locating and editing factual associations in GPT

K Meng, D Bau, A Andonian… - Advances in Neural …, 2022 - proceedings.neurips.cc
We analyze the storage and recall of factual associations in autoregressive transformer
language models, finding evidence that these associations correspond to localized, directly …

Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

[HTML][HTML] Large language models propagate race-based medicine

JA Omiye, JC Lester, S Spichak, V Rotemberg… - NPJ Digital …, 2023 - nature.com
Large language models (LLMs) are being integrated into healthcare systems; but these
models may recapitulate harmful, race-based medicine. The objective of this study is to …

Interpretability in the wild: a circuit for indirect object identification in gpt-2 small

K Wang, A Variengien, A Conmy, B Shlegeris… - arXiv preprint arXiv …, 2022 - arxiv.org
Research in mechanistic interpretability seeks to explain behaviors of machine learning
models in terms of their internal components. However, most previous work either focuses …