A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

From black boxes to actionable insights: a perspective on explainable artificial intelligence for scientific discovery

Z Wu, J Chen, Y Li, Y Deng, H Zhao… - Journal of Chemical …, 2023 - ACS Publications
The application of Explainable Artificial Intelligence (XAI) in the field of chemistry has
garnered growing interest for its potential to justify the prediction of black-box machine …

Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms

M Xiong, Z Hu, X Lu, Y Li, J Fu, J He, B Hooi - arXiv preprint arXiv …, 2023 - arxiv.org
The task of empowering large language models (LLMs) to accurately express their
confidence, referred to as confidence elicitation, is essential in ensuring reliable and …

Evaluation and analysis of hallucination in large vision-language models

J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Vision-Language Models (LVLMs) have recently achieved remarkable success.
However, LVLMs are still plagued by the hallucination problem, which limits the practicality …

Label-free node classification on graphs with large language models (llms)

Z Chen, H Mao, H Wen, H Han, W Jin, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
In recent years, there have been remarkable advancements in node classification achieved
by Graph Neural Networks (GNNs). However, they necessitate abundant high-quality labels …

Alignment for honesty

Y Yang, E Chern, X Qiu, G Neubig, P Liu - arXiv preprint arXiv:2312.07000, 2023 - arxiv.org
Recent research has made significant strides in applying alignment techniques to enhance
the helpfulness and harmlessness of large language models (LLMs) in accordance with …

" I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust

SSY Kim, QV Liao, M Vorvoreanu, S Ballard… - The 2024 ACM …, 2024 - dl.acm.org
Widely deployed large language models (LLMs) can produce convincing yet incorrect
outputs, potentially misleading users who may rely on them as if they were correct. To …

An emulator for fine-tuning large language models using small language models

E Mitchell, R Rafailov, A Sharma, C Finn… - arXiv preprint arXiv …, 2023 - arxiv.org
Widely used language models (LMs) are typically built by scaling up a two-stage training
pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning …

Decomposing uncertainty for large language models through input clarification ensembling

B Hou, Y Liu, K Qian, J Andreas, S Chang… - arXiv preprint arXiv …, 2023 - arxiv.org
Uncertainty decomposition refers to the task of decomposing the total uncertainty of a model
into data (aleatoric) uncertainty, resulting from the inherent complexity or ambiguity of the …

Quantifying uncertainty in answers from any language model via intrinsic and extrinsic confidence assessment

J Chen, J Mueller - arXiv preprint arXiv:2308.16175, 2023 - arxiv.org
We introduce BSDetector, a method for detecting bad and speculative answers from a
pretrained Large Language Model by estimating a numeric confidence score for any output …