Language model quality correlates with psychometric predictive power in multiple languages

EG Wilcox, CI Meister, R Cotterell… - Proceedings of the …, 2023 - research-collection.ethz.ch
Surprisal theory (Hale, 2001; Levy, 2008) posits that a word's reading time is proportional to
its surprisal (ie, to its negative log probability given the proceeding context). Since we are …

Predictability in Language Comprehension: Prospects and Problems for Surprisal

A Staub - Annual Review of Linguistics, 2024 - annualreviews.org
Surprisal theory proposes that a word's predictability influences processing difficulty
because each word requires the comprehender to update a probability distribution over …

Large-scale benchmark yields no evidence that language model surprisal explains syntactic disambiguation difficulty

KJ Huang, S Arehalli, M Kugemoto, C Muxica… - Journal of Memory and …, 2024 - Elsevier
Prediction has been proposed as an overarching principle that explains human information
processing in language and beyond. To what degree can processing difficulty in …

The linearity of the effect of surprisal on reading times across languages

W Xu, J Chon, T Liu, R Futrell - Findings of the Association for …, 2023 - aclanthology.org
In psycholinguistics, surprisal theory posits that the amount of online processing effort
expended by a human comprehender per word positively correlates with the surprisal of that …

Word frequency and predictability dissociate in naturalistic reading

C Shain - Open Mind, 2024 - direct.mit.edu
Many studies of human language processing have shown that readers slow down at less
frequent or less predictable words, but there is debate about whether frequency and …

Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal's Fit to Reading Times

BD Oh, S Yue, W Schuler - arXiv preprint arXiv:2402.02255, 2024 - arxiv.org
Recent studies have shown that as Transformer-based language models become larger and
are trained on very large amounts of data, the fit of their surprisal estimates to naturalistic …

Psychometric predictive power of large language models

T Kuribayashi, Y Oseki, T Baldwin - arXiv preprint arXiv:2311.07484, 2023 - arxiv.org
Next-word probabilities from language models have been shown to successfully simulate
human reading behavior. Building on this, we show that, interestingly, instruction-tuned …

Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities

BD Oh, W Schuler - arXiv preprint arXiv:2406.10851, 2024 - arxiv.org
Word-by-word conditional probabilities from Transformer-based language models are
increasingly being used to evaluate their predictions over minimal pairs or to model the …

Word length and frequency effects on text reading are highly similar in 12 alphabetic languages

V Kuperman, S Schroeder, D Gnetov - Journal of Memory and Language, 2024 - Elsevier
Reading research robustly finds that shorter and more frequent words are recognized faster
and skipped more often than longer and less frequent words. An empirical question that has …

TEMPERATURE-SCALING SURPRISAL ESTIMATES IMPROVE FIT TO HUMAN READING TIMES–BUT DOES IT DO SO FOR THE “RIGHT REASONS”?

T Liu, I Škrjanec, V Demberg - ICLR 2024 Workshop on …, 2024 - openreview.net
A wide body of evidence shows that human language processing difficulty is predicted by
the information-theoretic measure surprisal, a word's negative log probability in context …