Machine learning for synthetic data generation: a review

Y Lu, M Shen, H Wang, X Wang, C van Rechem… - arXiv preprint arXiv …, 2023 - arxiv.org
Machine learning heavily relies on data, but real-world applications often encounter various
data-related issues. These include data of poor quality, insufficient data points leading to …

Synthetic data generation: State of the art in health care domain

H Murtaza, M Ahmed, NF Khan, G Murtaza… - Computer Science …, 2023 - Elsevier
Recent progress in artificial intelligence and machine learning has led to the growth of
research in every aspect of life including the health care domain. However, privacy risks and …

Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

R McKenna, G Miklau, D Sheldon - arXiv preprint arXiv:2108.04978, 2021 - arxiv.org
We propose a general approach for differentially private synthetic data generation, that
consists of three steps:(1) select a collection of low-dimensional marginals,(2) measure …

A multi-dimensional evaluation of synthetic data generators

FK Dankar, MK Ibrahim, L Ismail - IEEE Access, 2022 - ieeexplore.ieee.org
Synthetic datasets are gradually emerging as solutions for data sharing. Multiple synthetic
data generators have been introduced in the last decade fueled by advancement in machine …

Fake it till you make it: Guidelines for effective synthetic data generation

FK Dankar, M Ibrahim - Applied Sciences, 2021 - mdpi.com
Synthetic data provides a privacy protecting mechanism for the broad usage and sharing of
healthcare data for secondary purposes. It is considered a safe approach for the sharing of …

CounterFAccTual: How FAccT undermines its organizing principles

B Gansky, S McDonald - Proceedings of the 2022 ACM Conference on …, 2022 - dl.acm.org
This essay joins recent scholarship in arguing that FAccT's fundamental framing of the
potential to achieve the normative conditions for justice through bettering the design of …

Synthetic tabular data evaluation in the health domain covering resemblance, utility, and privacy dimensions

M Hernadez, G Epelde, A Alberdi… - … of information in …, 2023 - thieme-connect.com
Background Synthetic tabular data generation is a potentially valuable technology with great
promise for data augmentation and privacy preservation. However, prior to adoption, an …

Survey on privacy-preserving techniques for microdata publication

T Carvalho, N Moniz, P Faria, L Antunes - ACM Computing Surveys, 2023 - dl.acm.org
The exponential growth of collected, processed, and shared microdata has given rise to
concerns about individuals' privacy. As a result, laws and regulations have emerged to …

They shall be fair, transparent, and robust: auditing learning analytics systems

K Simbeck - AI and Ethics, 2024 - Springer
In the near future, systems, that use Artificial Intelligence (AI) methods, such as machine
learning, are required to be certified or audited for fairness if used in ethically sensitive fields …

Exploring city digital twins as policy tools: A task-based approach to generating synthetic data on urban mobility

G Papyshev, M Yarime - Data & Policy, 2021 - cambridge.org
This article discusses the technology of city digital twins (CDTs) and its potential applications
in the policymaking context. The article analyzes the history of the development of the …