Convergence guarantees for the Good-Turing estimator

A Painsky - Journal of Machine Learning Research, 2022 - jmlr.org
Consider a finite sample from an unknown distribution over a countable alphabet. The
occupancy probability (OP) refers to the total probability of symbols that appear exactly k …

Generalized Good-Turing improves missing mass estimation

A Painsky - Journal of the American Statistical Association, 2023 - Taylor & Francis
Consider a finite sample from an unknown distribution over a countable alphabet. The
missing mass refers to the probability of symbols that do not appear in the sample …

Confidence Intervals for Parameters of Unobserved Events

A Painsky - Journal of the American Statistical Association, 2024 - Taylor & Francis
Consider a finite sample from an unknown distribution over a countable alphabet.
Unobserved events are alphabet symbols which do not appear in the sample. Estimating the …

Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence

A Pananjady, V Muthukumar, A Thangaraj - Journal of Machine Learning …, 2024 - jmlr.org
We study the problem of estimating the stationary mass---also called the unigram mass---
that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem …

Near-optimal estimation of the unseen under regularly varying tail populations

S Favaro, Z Naulet - Bernoulli, 2023 - projecteuclid.org
Near-optimal estimation of the unseen under regularly varying tail populations Page 1
Bernoulli 29(4), 2023, 3423–3442 https://doi.org/10.3150/23-BEJ1589 Near-optimal …

A data-driven missing mass estimation framework

A Painsky - 2022 IEEE International Symposium on Information …, 2022 - ieeexplore.ieee.org
Consider a finite sample from an unknown distribution over a countable alphabet. The
missing mass refers to the probability of symbols that do not appear in the sample. Missing …

Confidence intervals for unobserved events

A Painsky - arXiv preprint arXiv:2211.03052, 2022 - arxiv.org
Consider a finite sample from an unknown distribution over a countable alphabet.
Unobserved events are alphabet symbols which do not appear in the sample. Estimating the …

Necessary and sufficient conditions for the asymptotic normality of higher order Turing estimators

J Chang, M Grabchak - Bernoulli, 2023 - projecteuclid.org
Necessary and sufficient conditions for the asymptotic normality of higher order Turing
estimators Page 1 Bernoulli 29(4), 2023, 3369–3395 https://doi.org/10.3150/23-BEJ1587 …

Just Wing It: Optimal Estimation of Missing Mass in a Markovian Sequence

A Pananjady, V Muthukumar, A Thangaraj - arXiv preprint arXiv …, 2024 - arxiv.org
We study the problem of estimating the stationary mass--also called the unigram mass--that
is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem has …

Bayesian nonparametric estimation of coverage probabilities and distinct counts from sketched data

S Favaro, M Sesia - arXiv preprint arXiv:2209.02135, 2022 - arxiv.org
The estimation of coverage probabilities, and in particular of the missing mass, is a classical
statistical problem with applications in numerous scientific fields. In this paper, we study this …