Evaluating the moral beliefs encoded in llms

N Scherrer, C Shi, A Feder… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper presents a case study on the design, administration, post-processing, and
evaluation of surveys on large language models (LLMs). It comprises two components:(1) A …

Consistency analysis of chatgpt

ME Jang, T Lukasiewicz - arXiv preprint arXiv:2303.06273, 2023 - arxiv.org
ChatGPT has gained a huge popularity since its introduction. Its positive aspects have been
reported through many media platforms, and some analyses even showed that ChatGPT …

Internal consistency and self-feedback in large language models: A survey

X Liang, S Song, Z Zheng, H Wang, Q Yu, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations.
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …

Quantifying ai psychology: A psychometrics benchmark for large language models

Y Li, Y Huang, H Wang, X Zhang, J Zou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities,
increasingly adopting roles akin to human-like assistants. The broader integration of LLMs …

Evaluating superhuman models with consistency checks

L Fluri, D Paleka, F Tramèr - 2024 IEEE Conference on Secure …, 2024 - ieeexplore.ieee.org
If machine learning models were to achieve superhuman abilities at various reasoning or
decision-making tasks, how would we go about evaluating such models, given that humans …

Medec: A benchmark for medical error detection and correction in clinical notes

AB Abacha, W Yim, Y Fu, Z Sun, M Yetisgen… - arXiv preprint arXiv …, 2024 - arxiv.org
Several studies showed that Large Language Models (LLMs) can answer medical questions
correctly, even outperforming the average human score in some medical exams. However …

BRAINTEASER: Lateral Thinking Puzzles for Large Language Model

Y Jiang, F Ilievski, K Ma - arXiv preprint arXiv:2310.05057, 2023 - arxiv.org
The success of language models has inspired the NLP community to attend to tasks that
require implicit and complex reasoning, relying on human-like commonsense mechanisms …

Large language models as reliable knowledge bases?

D Zheng, M Lapata, JZ Pan - arXiv preprint arXiv:2407.13578, 2024 - arxiv.org
The NLP community has recently shown a growing interest in leveraging Large Language
Models (LLMs) for knowledge-intensive tasks, viewing LLMs as potential knowledge bases …

Understanding users' dissatisfaction with chatgpt responses: Types, resolving tactics, and the effect of knowledge level

Y Kim, J Lee, S Kim, J Park, J Kim - Proceedings of the 29th International …, 2024 - dl.acm.org
Large language models (LLMs) with chat-based capabilities, such as ChatGPT, are widely
used in various workflows. However, due to a limited understanding of these large-scale …

From form (s) to meaning: Probing the semantic depths of language models using multisense consistency

X Ohmer, E Bruni, D Hupke - Computational Linguistics, 2024 - direct.mit.edu
The staggering pace with which the capabilities of large language models (LLMs) are
increasing, as measured by a range of commonly used natural language understanding …