Evaluation of openai o1: Opportunities and challenges of agi

T Zhong, Z Liu, Y Pan, Y Zhang, Y Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
This comprehensive study evaluates the performance of OpenAI's o1-preview large
language model across a diverse array of complex reasoning tasks, spanning multiple …

Semeval-2024 task 9: Brainteaser: A novel task defying common sense

Y Jiang, F Ilievski, K Ma - arXiv preprint arXiv:2404.16068, 2024 - arxiv.org
While vertical thinking relies on logical and commonsense reasoning, lateral thinking
requires systems to defy commonsense associations and overwrite them through …

ARN: Analogical Reasoning on Narratives

Z Sourati, F Ilievski, P Sommerauer… - Transactions of the …, 2024 - direct.mit.edu
As a core cognitive skill that enables the transferability of information across domains,
analogical reasoning has been extensively studied for both humans and computational …

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

M Nezhurina, L Cipolina-Kun, M Cherti… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are often described as being instances of foundation
models-that is, models that transfer strongly across various tasks and conditions in few-show …

Anthropocentric bias and the possibility of artificial cognition

R Millière, C Rathkopf - arXiv preprint arXiv:2407.03859, 2024 - arxiv.org
Evaluating the cognitive capacities of large language models (LLMs) requires overcoming
not only anthropomorphic but also anthropocentric biases. This article identifies two types of …

From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models

E Nisioti, C Glanois, E Najarro, A Dai… - Artificial Life …, 2024 - direct.mit.edu
Abstract Large Language Models (LLMs) have taken the field of AI by storm, but their
adoption in the field of Artificial Life (ALife) has been, so far, relatively reserved. In this work …

Evidence from counterfactual tasks supports emergent analogical reasoning in large language models

T Webb, KJ Holyoak, H Lu - arXiv preprint arXiv:2404.13070, 2024 - arxiv.org
We recently reported evidence that large language models are capable of solving a wide
range of text-based analogy problems in a zero-shot manner, indicating the presence of an …

The CoExplorer Technology Probe: A Generative AI-Powered Adaptive Interface to Support Intentionality in Planning and Running Video Meetings

GW Park, P Panda, L Tankelevitch… - Proceedings of the 2024 …, 2024 - dl.acm.org
Effective meetings are effortful, but traditional videoconferencing systems offer little support
for reducing this effort across the meeting lifecycle. Generative AI (GenAI) has the potential …

Not All LLM Reasoners Are Created Equal

A Hosseini, A Sordoni, D Toyama, A Courville… - arXiv preprint arXiv …, 2024 - arxiv.org
We study the depth of grade-school math (GSM) problem-solving capabilities of LLMs. To
this end, we evaluate their performance on pairs of existing math word problems together so …

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

G Puebla, JS Bowers - arXiv preprint arXiv:2402.12675, 2024 - arxiv.org
Achieving visual reasoning is a long-term goal of artificial intelligence. In the last decade,
several studies have applied deep neural networks (DNNs) to the task of learning visual …