Entropy rate estimates for natural language—A new extrapolation of compressed large-scale corpora

R Takahira, K Tanaka-Ishii, Ł Dębowski - Entropy, 2016 - mdpi.com
One of the fundamental questions about human language is whether its entropy rate is
positive. The entropy rate measures the average amount of information communicated per …

A simplistic model of neural scaling laws: Multiperiodic Santa Fe processes

Ł Dębowski - arXiv preprint arXiv:2302.09049, 2023 - arxiv.org
It was observed that large language models exhibit a power-law decay of cross entropy with
respect to the number of parameters and training tokens. When extrapolated literally, this …

Is natural language a perigraphic process? The theorem about facts and words revisited

Ł Dębowski - Entropy, 2018 - mdpi.com
As we discuss, a stationary stochastic process is nonergodic when a random persistent topic
can be detected in the infinite random text sampled from the process, whereas we call the …

A refutation of finite-state language models through Zipf's law for factual knowledge

Ł Dębowski - Entropy, 2021 - mdpi.com
We present a hypothetical argument against finite-state processes in statistical language
modeling that is based on semantics rather than syntax. In this theoretical model, we …

Cross entropy of neural language models at infinity—a new bound of the entropy rate

S Takahashi, K Tanaka-Ishii - Entropy, 2018 - mdpi.com
Neural language models have drawn a lot of attention for their strong ability to predict
natural language text. In this paper, we estimate the entropy rate of natural language with …

Maximal repetition and zero entropy rate

Ł Dębowski - IEEE Transactions on Information Theory, 2017 - ieeexplore.ieee.org
Maximal repetition of a string is the maximal length of a repeated substring. This paper
investigates maximal repetition of strings drawn from stochastic processes. Strengthening …

Recurrence and repetition times in the case of a stretched exponential growth

Ł Dębowski - arXiv preprint arXiv:2306.14703, 2023 - arxiv.org
By an analogy to the duality between the recurrence time and the longest match length, we
introduce a quantity dual to the maximal repetition length, which we call the repetition time …

Regular Hilberg processes: An example of processes with a vanishing entropy rate

Ł Dębowski - IEEE Transactions on Information Theory, 2017 - ieeexplore.ieee.org
A regular Hilberg process is a stationary process that satisfies both a hyperlogarithmic
growth of maximal repetition and a power-law growth of topological entropy, which are a …

[PDF][PDF] Hilberg's Conjecture–a Challenge for Machine Learning

Ł Dębowski - Schedae Informaticae, 2014 - bibliotekanauki.pl
We review three mathematical developments linked with Hilberg's conjecture–a hypothesis
about the power-law growth of entropy of texts in natural language, which sets up a …

[PDF][PDF] Natural Language Is Not A Finite-State Process: Evidence from Three Statistical Power Laws

Ł Dębowski - researchgate.net
BF Skinner. Verbal Behavior. Prentice Hall, 1957. Skinner-like argument: Human brain
consists of a billion of neurons (a finite number). Assuming that each neuron can be in two …