Computational Constancy Measures of Texts—Yule's K and Rényi's Entropy

K Tanaka-Ishii, S Aihara - Computational Linguistics, 2015 - direct.mit.edu
This article presents a mathematical and empirical verification of computational constancy
measures for natural language text. A constancy measure characterizes a given text by …

A simplistic model of neural scaling laws: Multiperiodic Santa Fe processes

Ł Dębowski - arXiv preprint arXiv:2302.09049, 2023 - arxiv.org
It was observed that large language models exhibit a power-law decay of cross entropy with
respect to the number of parameters and training tokens. When extrapolated literally, this …

Is natural language a perigraphic process? The theorem about facts and words revisited

Ł Dębowski - Entropy, 2018 - mdpi.com
As we discuss, a stationary stochastic process is nonergodic when a random persistent topic
can be detected in the infinite random text sampled from the process, whereas we call the …

On the vocabulary of grammar-based codes and the logical consistency of texts

Ł Debowski - IEEE Transactions on Information Theory, 2011 - ieeexplore.ieee.org
This paper presents a new interpretation for Zipf–Mandelbrot's law in natural language
which rests on two areas of information theory. Firstly, we construct a new class of grammar …

Infinite excess entropy processes with countable-state generators

NF Travers, JP Crutchfield - Entropy, 2014 - mdpi.com
We present two examples of finite-alphabet, infinite excess entropy processes generated by
stationary hidden Markov models (HMMs) with countable state sets. The first, simpler …

A refutation of finite-state language models through Zipf's law for factual knowledge

Ł Dębowski - Entropy, 2021 - mdpi.com
We present a hypothetical argument against finite-state processes in statistical language
modeling that is based on semantics rather than syntax. In this theoretical model, we …

Excess entropy in natural language: Present state and perspectives

Ł Dębowski - Chaos: An Interdisciplinary Journal of Nonlinear …, 2011 - pubs.aip.org
We review recent progress in understanding the meaning of mutual information in natural
language. Let us define words in a text as strings that occur sufficiently often. In a few …

Universal densities exist for every finite reference measure

Ł Dębowski - IEEE Transactions on Information Theory, 2023 - ieeexplore.ieee.org
As it is known, universal codes, which estimate the entropy rate consistently, exist for
stationary ergodic sources over finite alphabets but not over countably infinite ones. We …

Cross entropy of neural language models at infinity—a new bound of the entropy rate

S Takahashi, K Tanaka-Ishii - Entropy, 2018 - mdpi.com
Neural language models have drawn a lot of attention for their strong ability to predict
natural language text. In this paper, we estimate the entropy rate of natural language with …

Mixing, ergodic, and nonergodic processes with rapidly growing information between blocks

Ł Debowski - IEEE Transactions on Information Theory, 2012 - ieeexplore.ieee.org
We construct mixing processes over an infinite alphabet and ergodic processes over a finite
alphabet for which Shannon mutual information between adjacent blocks of length n grows …