M Kosinski - Proceedings of the National Academy of Sciences, 2024 - pnas.org
Eleven large language models (LLMs) were assessed using 40 bespoke false-belief tasks,
considered a gold standard in testing theory of mind (ToM) in humans. Each task included a …