We present a hypothetical argument against finite-state processes in statistical language modeling that is based on semantics rather than syntax. In this theoretical model, we …
Ł Dębowski - Chaos: An Interdisciplinary Journal of Nonlinear …, 2011 - pubs.aip.org
We review recent progress in understanding the meaning of mutual information in natural language. Let us define words in a text as strings that occur sufficiently often. In a few …
Ł Debowski - IEEE Transactions on Information Theory, 2012 - ieeexplore.ieee.org
We construct mixing processes over an infinite alphabet and ergodic processes over a finite alphabet for which Shannon mutual information between adjacent blocks of length n grows …
The article discusses two mutually-incompatible hypotheses about the stochastic mechanism of the generation of texts in natural language, which could be related to entropy …
Ł Dębowski - Journal of Theoretical Probability, 2014 - Springer
We investigate stationary hidden Markov processes for which mutual information between the past and the future is infinite. It is assumed that the number of observable states is finite …
Ł Dębowski - Journal of Quantitative Linguistics, 2015 - Taylor & Francis
The relaxed Hilberg conjecture states that the mutual information between two adjacent blocks of text in natural language grows as a power of the block length. The present paper …
Motivated by problems of statistical language modeling, we consider probability measures on infinite sequences over two countable alphabets of a different cardinality, such as letters …
Ł Dębowski - IEEE Transactions on Information Theory, 2017 - ieeexplore.ieee.org
A regular Hilberg process is a stationary process that satisfies both a hyperlogarithmic growth of maximal repetition and a power-law growth of topological entropy, which are a …
We review three mathematical developments linked with Hilberg's conjecture–a hypothesis about the power-law growth of entropy of texts in natural language, which sets up a …