Survey of hallucination in natural language generation

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM Computing …, 2023 - dl.acm.org
Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

Recent advances in deep learning based dialogue systems: A systematic survey

J Ni, T Young, V Pandelea, F Xue… - Artificial intelligence review, 2023 - Springer
Dialogue systems are a popular natural language processing (NLP) task as it is promising in
real-life applications. It is also a complicated task since many NLP tasks deserving study are …

Memory-based model editing at scale

E Mitchell, C Lin, A Bosselut… - International …, 2022 - proceedings.mlr.press
Even the largest neural networks make errors, and once-correct predictions can become
invalid as the world changes. Model editors make local updates to the behavior of base (pre …

Red teaming language models with language models

E Perez, S Huang, F Song, T Cai, R Ring… - arXiv preprint arXiv …, 2022 - arxiv.org
Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …

Recipes for building an open-domain chatbot

S Roller, E Dinan, N Goyal, D Ju, M Williamson… - arXiv preprint arXiv …, 2020 - arxiv.org
Building open-domain chatbots is a challenging area for machine learning research. While
prior work has shown that scaling neural models in the number of parameters and the size of …

Quark: Controllable text generation with reinforced unlearning

X Lu, S Welleck, J Hessel, L Jiang… - Advances in neural …, 2022 - proceedings.neurips.cc
Large-scale language models often learn behaviors that are misaligned with user
expectations. Generated text may contain offensive or toxic language, contain significant …

FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization

E Durmus, H He, M Diab - arXiv preprint arXiv:2005.03754, 2020 - arxiv.org
Neural abstractive summarization models are prone to generate content inconsistent with
the source document, ie unfaithful. Existing automatic metrics do not capture such mistakes …

Pretrained language models for biomedical and clinical tasks: understanding and extending the state-of-the-art

P Lewis, M Ott, J Du, V Stoyanov - Proceedings of the 3rd clinical …, 2020 - aclanthology.org
A large array of pretrained models are available to the biomedical NLP (BioNLP) community.
Finding the best model for a particular task can be difficult and time-consuming. For many …

[HTML][HTML] How much do language models copy from their training data? evaluating linguistic novelty in text generation using raven

RT McCoy, P Smolensky, T Linzen, J Gao… - Transactions of the …, 2023 - direct.mit.edu
Current language models can generate high-quality text. Are they simply copying text they
have seen before, or have they learned generalizable linguistic abstractions? To tease apart …

Locally typical sampling

C Meister, T Pimentel, G Wiher… - Transactions of the …, 2023 - direct.mit.edu
Today's probabilistic language generators fall short when it comes to producing coherent
and fluent text despite the fact that the underlying models perform well under standard …