A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

C Wang, X Liu, Y Yue, X Tang, T Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

[HTML][HTML] Large language models encode clinical knowledge

K Singhal, S Azizi, T Tu, SS Mahdavi, J Wei, HW Chung… - Nature, 2023 - nature.com
Large language models (LLMs) have demonstrated impressive capabilities, but the bar for
clinical applications is high. Attempts to assess the clinical knowledge of models typically …

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

Large language models encode clinical knowledge

K Singhal, S Azizi, T Tu, SS Mahdavi, J Wei… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language understanding and generation, but the quality bar for medical and clinical …

Inference-time intervention: Eliciting truthful answers from a language model

K Li, O Patel, F Viégas, H Pfister… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract We introduce Inference-Time Intervention (ITI), a technique designed to enhance
the" truthfulness" of large language models (LLMs). ITI operates by shifting model activations …

Large language models for information retrieval: A survey

Y Zhu, H Yuan, S Wang, J Liu, W Liu, C Deng… - arXiv preprint arXiv …, 2023 - arxiv.org
As a primary means of information acquisition, information retrieval (IR) systems, such as
search engines, have integrated themselves into our daily lives. These systems also serve …

Improving factuality and reasoning in language models through multiagent debate

Y Du, S Li, A Torralba, JB Tenenbaum… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities in language
generation, understanding, and few-shot learning in recent years. An extensive body of work …

Defending chatgpt against jailbreak attack via self-reminders

Y Xie, J Yi, J Shao, J Curl, L Lyu, Q Chen… - Nature Machine …, 2023 - nature.com
ChatGPT is a societally impactful artificial intelligence tool with millions of users and
integration into products such as Bing. However, the emergence of jailbreak attacks notably …