Inadequacies of large language model benchmarks in the era of generative artificial intelligence

TR McIntosh, T Susnjak, N Arachchilage, T Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities
has spurred public curiosity to evaluate and compare different LLMs, leading many …

Moca: Measuring human-language model alignment on causal and moral judgment tasks

A Nie, Y Zhang, AS Amdekar, C Piech… - Advances in …, 2023 - proceedings.neurips.cc
Human commonsense understanding of the physical and social world is organized around
intuitive theories. These theories support making causal and moral judgments. When …

Perils and opportunities in using large language models in psychological research

S Abdurahman, M Atari, F Karimi-Malekabadi… - PNAS …, 2024 - academic.oup.com
The emergence of large language models (LLMs) has sparked considerable interest in their
potential application in psychological research, mainly as a model of the human psyche or …

[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …

F Wang, Z Zhang, X Zhang, Z Wu, T Mo, Q Lu… - arXiv preprint arXiv …, 2024 - ai.radensa.ru
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …

On the humanity of conversational ai: Evaluating the psychological portrayal of llms

J Huang, W Wang, EJ Li, MH Lam, S Ren… - The Twelfth …, 2023 - openreview.net
Large Language Models (LLMs) have recently showcased their remarkable capacities, not
only in natural language processing tasks but also across diverse domains such as clinical …

Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

J Huang, W Wang, EJ Li, MH Lam, S Ren… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently showcased their remarkable capacities, not
only in natural language processing tasks but also across diverse domains such as clinical …

[HTML][HTML] Surprising gender biases in GPT

RA Fulgu, V Capraro - Computers in Human Behavior Reports, 2024 - Elsevier
We present eight experiments exploring gender biases in GPT. Initially, GPT was asked to
generate demographics of a potential writer of fourty phrases ostensibly written by …

Evaluating cultural adaptability of a large language model via simulation of synthetic personas

L Kwok, M Bravansky, LD Griffin - arXiv preprint arXiv:2408.06929, 2024 - arxiv.org
The success of Large Language Models (LLMs) in multicultural environments hinges on
their ability to understand users' diverse cultural backgrounds. We measure this capability by …

Analyzing nobel prize literature with large language models

Z Yang, Z Liu, J Zhang, C Lu, J Tai, T Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org
This study examines the capabilities of advanced Large Language Models (LLMs),
particularly the o1 model, in the context of literary analysis. The outputs of these models are …

What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs

S Yang, S Zhu, R Bao, L Liu, Y Cheng, L Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities in generating
human-like text and exhibiting personality traits similar to those in humans. However, the …