Generalized Good-Turing improves missing mass estimation

A Painsky - Journal of the American Statistical Association, 2023 - Taylor & Francis
Consider a finite sample from an unknown distribution over a countable alphabet. The
missing mass refers to the probability of symbols that do not appear in the sample …

A data-driven missing mass estimation framework

A Painsky - 2022 IEEE International Symposium on Information …, 2022 - ieeexplore.ieee.org
Consider a finite sample from an unknown distribution over a countable alphabet. The
missing mass refers to the probability of symbols that do not appear in the sample. Missing …

Confidence intervals for unobserved events

A Painsky - arXiv preprint arXiv:2211.03052, 2022 - arxiv.org
Consider a finite sample from an unknown distribution over a countable alphabet.
Unobserved events are alphabet symbols which do not appear in the sample. Estimating the …

Consistent estimation of small masses in feature sampling

F Ayed, M Battiston, F Camerlenghi, S Favaro - Journal of Machine …, 2021 - jmlr.org
Consider an (observable) random sample of sizenfrom an infinite population of individuals,
each individual being endowed with a finite set of" features" from a collection of features (Fj) …

On missing mass variance

M Skorski - arXiv preprint arXiv:2104.07028, 2021 - arxiv.org
The missing mass refers to the probability of elements not observed in a sample, and since
the work of Good and Turing during WWII, has been studied extensively in many areas …

Mean-squared accuracy of good-turing estimator

M Skorski - 2021 IEEE International Symposium on Information …, 2021 - ieeexplore.ieee.org
The brilliant method due to Good and Turing allows for estimating objects not occurring in a
sample. The problem, known under names “sample coverage” or “missing mass” goes back …

[PDF][PDF] Bayesian nonparametric prediction: from species to features

L Masoero, F Camerlenghi, S Favaro… - Preface XIX 1 Plenary …, 2021 - re.public.polimi.it
In species sampling models, observations represent the species' labels of distinct animals in
a population. Feature sampling models generalize species sampling models by allowing …

[PDF][PDF] On Mean-Squared Error of Good-Turing Estimator

M Skorski - researchgate.net
What percentage of the population is not represented in a sample? This problem, known
under names” sample coverage” or” missing mass”, goes back to the cryptograpic work of …

Consistent estimation of the missing mass for feature models

F Ayed, M Battiston, F Camerlenghi… - arXiv preprint arXiv …, 2019 - arxiv.org
Feature models are popular in machine learning and they have been recently used to solve
many unsupervised learning problems. In these models every observation is endowed with …