Surprisal theory posits that less-predictable words should take more time to process, with word predictability quantified as surprisal, ie, negative log probability in context. While …
The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal …
When generating natural language from neural probabilistic models, high probability does not always coincide with high quality: It has often been observed that mode-seeking …
Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative costs. Under the assumption that cost is given by an utterance's length, he supported this …
While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to …
Self-supervised representation learning for speech often involves a quantization step that transforms the acoustic input into discrete units. However, it remains unclear how to …
Prosody--the suprasegmental component of speech, including pitch, loudness, and tempo-- carries critical aspects of meaning. However, the relationship between the information …
This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the …
Grammatical cues are sometimes redundant with word meanings in natural language. For instance, English word order rules constrain the word order of a sentence like “The dog …