Exposed! a survey of attacks on private data

C Dwork, A Smith, T Steinke… - Annual Review of …, 2017 - annualreviews.org
Privacy-preserving statistical data analysis addresses the general question of protecting
privacy when publicly releasing information about a sensitive dataset. A privacy attack takes …

More than privacy: Applying differential privacy in key areas of artificial intelligence

T Zhu, D Ye, W Wang, W Zhou… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Artificial Intelligence (AI) has attracted a great deal of attention in recent years. However,
alongside all its advancements, problems have also emerged, such as privacy violations …

The complexity of differential privacy

S Vadhan - Tutorials on the Foundations of Cryptography …, 2017 - Springer
Differential privacy is a theoretical framework for ensuring the privacy of individual-level data
when performing statistical analysis of privacy-sensitive datasets. This tutorial provides an …

Reasoning about generalization via conditional mutual information

T Steinke, L Zakynthinou - Conference on Learning Theory, 2020 - proceedings.mlr.press
We provide an information-theoretic framework for studying the generalization properties of
machine learning algorithms. Our framework ties together existing approaches, including …

Preserving statistical validity in adaptive data analysis

C Dwork, V Feldman, M Hardt, T Pitassi… - Proceedings of the forty …, 2015 - dl.acm.org
A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries,
from the use of sophisticated validation techniques, to deep statistical methods for …

Algorithmic stability for adaptive data analysis

R Bassily, K Nissim, A Smith, T Steinke… - Proceedings of the forty …, 2016 - dl.acm.org
Adaptivity is an important feature of data analysis-the choice of questions to ask about a
dataset often depends on previous interactions with the same dataset. However, statistical …

Generalization in adaptive data analysis and holdout reuse

C Dwork, V Feldman, M Hardt… - Advances in neural …, 2015 - proceedings.neurips.cc
Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to
understanding this problem focus on statistical inference and generalization of individual …

Controlling bias in adaptive data analysis using information theory

D Russo, J Zou - Artificial Intelligence and Statistics, 2016 - proceedings.mlr.press
Modern big data settings often involve messy, high-dimensional data, where it is not clear a
priori what are the right questions to ask. To extract the most insights from a dataset, the …

How much does your data exploration overfit? Controlling bias via information usage

D Russo, J Zou - IEEE Transactions on Information Theory, 2019 - ieeexplore.ieee.org
Modern data is messy and high-dimensional, and it is often not clear a priori what are the
right questions to ask. Instead, the analyst typically needs to use the data to search for …

Eleven quick tips for data cleaning and feature engineering

D Chicco, L Oneto, E Tavazzi - PLOS Computational Biology, 2022 - journals.plos.org
Applying computational statistics or machine learning methods to data is a key component of
many scientific studies, in any field, but alone might not be sufficient to generate robust and …