On the vocabulary of grammar-based codes and the logical consistency of texts

Ł Debowski - IEEE Transactions on Information Theory, 2011 - ieeexplore.ieee.org
This paper presents a new interpretation for Zipf–Mandelbrot's law in natural language
which rests on two areas of information theory. Firstly, we construct a new class of grammar …

A refutation of finite-state language models through Zipf's law for factual knowledge

Ł Dębowski - Entropy, 2021 - mdpi.com
We present a hypothetical argument against finite-state processes in statistical language
modeling that is based on semantics rather than syntax. In this theoretical model, we …

Excess entropy in natural language: Present state and perspectives

Ł Dębowski - Chaos: An Interdisciplinary Journal of Nonlinear …, 2011 - pubs.aip.org
We review recent progress in understanding the meaning of mutual information in natural
language. Let us define words in a text as strings that occur sufficiently often. In a few …

Mixing, ergodic, and nonergodic processes with rapidly growing information between blocks

Ł Debowski - IEEE Transactions on Information Theory, 2012 - ieeexplore.ieee.org
We construct mixing processes over an infinite alphabet and ergodic processes over a finite
alphabet for which Shannon mutual information between adjacent blocks of length n grows …

Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture

Ł Dębowski - Entropy, 2015 - mdpi.com
The article discusses two mutually-incompatible hypotheses about the stochastic
mechanism of the generation of texts in natural language, which could be related to entropy …

On hidden Markov processes with infinite excess entropy

Ł Dębowski - Journal of Theoretical Probability, 2014 - Springer
We investigate stationary hidden Markov processes for which mutual information between
the past and the future is infinite. It is assumed that the number of observable states is finite …

The relaxed Hilberg conjecture: A review and new experimental support

Ł Dębowski - Journal of Quantitative Linguistics, 2015 - Taylor & Francis
The relaxed Hilberg conjecture states that the mutual information between two adjacent
blocks of text in natural language grows as a power of the block length. The present paper …

From Letters to Words and Back: Invertible Coding of Stationary Measures

Ł Dębowski - arXiv preprint arXiv:2409.13600, 2024 - arxiv.org
Motivated by problems of statistical language modeling, we consider probability measures
on infinite sequences over two countable alphabets of a different cardinality, such as letters …

Regular Hilberg processes: An example of processes with a vanishing entropy rate

Ł Dębowski - IEEE Transactions on Information Theory, 2017 - ieeexplore.ieee.org
A regular Hilberg process is a stationary process that satisfies both a hyperlogarithmic
growth of maximal repetition and a power-law growth of topological entropy, which are a …

[PDF][PDF] Hilberg's Conjecture–a Challenge for Machine Learning

Ł Dębowski - Schedae Informaticae, 2014 - bibliotekanauki.pl
We review three mathematical developments linked with Hilberg's conjecture–a hypothesis
about the power-law growth of entropy of texts in natural language, which sets up a …