Applicability of machine learning in spam and phishing email filtering: review and approaches

T Gangavarapu, CD Jaidhar, B Chanduka - Artificial Intelligence Review, 2020 - Springer
With the influx of technological advancements and the increased simplicity in
communication, especially through emails, the upsurge in the volume of unsolicited bulk …

Determining sequencing depth in a single-cell RNA-seq experiment

MJ Zhang, V Ntranos, D Tse - Nature communications, 2020 - nature.com
An underlying question for virtually all single-cell RNA sequencing experiments is how to
allocate the limited sequencing budget: deep sequencing of a few cells or shallow …

Estimating mutual information for discrete-continuous mixtures

W Gao, S Kannan, S Oh… - Advances in neural …, 2017 - proceedings.neurips.cc
Estimation of mutual information from observed samples is a basic primitive in machine
learning, useful in several learning tasks including correlation mining, information …

Minimax rates of entropy estimation on large alphabets via best polynomial approximation

Y Wu, P Yang - IEEE Transactions on Information Theory, 2016 - ieeexplore.ieee.org
Consider the problem of estimating the Shannon entropy of a distribution over k elements
from n independent samples. We show that the minimax mean-square error is within the …

A survey on distribution testing: Your data is big. But is it blue?

CL Canonne - Theory of Computing, 2020 - theoryofcomputing.org
The field of property testing originated in work on program checking, and has evolved into
an established and very active research area. In this work, we survey the developments of …

Estimating the unseen: improved estimators for entropy and other properties

G Valiant, P Valiant - Journal of the ACM (JACM), 2017 - dl.acm.org
We show that a class of statistical properties of distributions, which includes such practically
relevant properties as entropy, the number of distinct elements, and distance metrics …

Demystifying Fixed -Nearest Neighbor Information Estimators

W Gao, S Oh, P Viswanath - IEEE Transactions on Information …, 2018 - ieeexplore.ieee.org
Estimating mutual information from independent identically distributed samples drawn from
an unknown joint density function is a basic statistical problem of broad interest with …

The entropy of words—Learnability and expressivity across more than 1000 languages

C Bentz, D Alikaniotis, M Cysouw, R Ferrer-i-Cancho - Entropy, 2017 - mdpi.com
The choice associated with words is a fundamental property of natural languages. It lies at
the heart of quantitative linguistics, computational linguistics and language sciences more …

On the information bottleneck problems: Models, connections, applications and information theoretic views

A Zaidi, I Estella-Aguerri, S Shamai - Entropy, 2020 - mdpi.com
This tutorial paper focuses on the variants of the bottleneck problem taking an information
theoretic perspective and discusses practical methods to solve it, as well as its connection to …

Toward distributed energy services: Decentralizing optimal power flow with machine learning

R Dobbe, O Sondermeijer… - … on Smart Grid, 2019 - ieeexplore.ieee.org
The implementation of optimal power flow (OPF) methods to perform voltage and power flow
regulation in electric networks is generally believed to require extensive communication. We …