psycholinguistic benchmark known as the cloze task, which measures next-word
predictability. However, LMs lack the rich set of experiences that people do, and humans
can be highly creative. To assess human parity in these models' training objective, we
compare the predictions of three neural language models to those of human participants in a
freely available behavioral dataset (Luke & Christianson, 2016). Our results show that while …