Survey on privacy-preserving techniques for microdata publication

T Carvalho, N Moniz, P Faria, L Antunes - ACM Computing Surveys, 2023 - dl.acm.org
The exponential growth of collected, processed, and shared microdata has given rise to
concerns about individuals' privacy. As a result, laws and regulations have emerged to …

Synthetic Data--what, why and how?

J Jordon, L Szpruch, F Houssiau, M Bottarelli… - arXiv preprint arXiv …, 2022 - arxiv.org
This explainer document aims to provide an overview of the current state of the rapidly
expanding work on synthetic data technologies, with a particular focus on privacy. The …

[图书][B] Synthetic data for deep learning

SI Nikolenko - 2021 - Springer
You are holding in your hands… oh, come on, who holds books like this in their hands
anymore? Anyway, you are reading this, and it means that I have managed to release one of …

Synthetic data–anonymisation groundhog day

T Stadler, B Oprisanu, C Troncoso - 31st USENIX Security Symposium …, 2022 - usenix.org
Synthetic data has been advertised as a silver-bullet solution to privacy-preserving data
publishing that addresses the shortcomings of traditional anonymisation techniques. The …

Synthesizing tabular data using generative adversarial networks

L Xu, K Veeramachaneni - arXiv preprint arXiv:1811.11264, 2018 - arxiv.org
Generative adversarial networks (GANs) implicitly learn the probability distribution of a
dataset and can draw samples from the distribution. This paper presents, Tabular GAN …

Comprehensive exploration of synthetic data generation: A survey

A Bauer, S Trapp, M Stenger, R Leppich… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent years have witnessed a surge in the popularity of Machine Learning (ML), applied
across diverse domains. However, progress is impeded by the scarcity of training data due …

Explainable decision forest: Transforming a decision forest into an interpretable tree

O Sagi, L Rokach - Information Fusion, 2020 - Elsevier
Decision forests are considered the best practice in many machine learning challenges,
mainly due to their superior predictive performance. However, simple models like decision …

Robin hood and matthew effects: Differential privacy has disparate impact on synthetic data

G Ganev, B Oprisanu… - … Conference on Machine …, 2022 - proceedings.mlr.press
Generative models trained with Differential Privacy (DP) can be used to generate synthetic
data while minimizing privacy risks. We analyze the impact of DP on these models vis-a-vis …

[HTML][HTML] Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing

D Rankin, M Black, R Bond, J Wallace… - JMIR medical …, 2020 - medinform.jmir.org
Background: The exploitation of synthetic data in health care is at an early stage. Synthetic
data could unlock the potential within health care datasets that are too sensitive for release …

A multi-dimensional evaluation of synthetic data generators

FK Dankar, MK Ibrahim, L Ismail - IEEE Access, 2022 - ieeexplore.ieee.org
Synthetic datasets are gradually emerging as solutions for data sharing. Multiple synthetic
data generators have been introduced in the last decade fueled by advancement in machine …