Machine learning for synthetic data generation: a review

Y Lu, M Shen, H Wang, X Wang, C van Rechem… - arXiv preprint arXiv …, 2023 - arxiv.org
Machine learning heavily relies on data, but real-world applications often encounter various
data-related issues. These include data of poor quality, insufficient data points leading to …

Synthcity: a benchmark framework for diverse use cases of tabular synthetic data

Z Qian, R Davis… - Advances in Neural …, 2024 - proceedings.neurips.cc
Accessible high-quality data is the bread and butter of machine learning research, and the
demand for data has exploded as larger and more advanced ML models are built across …

Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark

L Hansen, N Seedat… - Advances in Neural …, 2023 - proceedings.neurips.cc
Synthetic data serves as an alternative in training machine learning models, particularly
when real-world data is limited or inaccessible. However, ensuring that synthetic data …

A survey of synthetic data generation for machine learning

M Abufadda, K Mansour - 2021 22nd international arab …, 2021 - ieeexplore.ieee.org
Data is the fuel of machine learning algorithms, therefore data generation in machine
learning is becoming an important topic. The problem is that finding enough data for …

Beyond privacy: Navigating the opportunities and challenges of synthetic data

B van Breugel, M van der Schaar - arXiv preprint arXiv:2304.03722, 2023 - arxiv.org
Generating synthetic data through generative models is gaining interest in the ML
community and beyond. In the past, synthetic data was often regarded as a means to private …

Synthetic data generation: State of the art in health care domain

H Murtaza, M Ahmed, NF Khan, G Murtaza… - Computer Science …, 2023 - Elsevier
Recent progress in artificial intelligence and machine learning has led to the growth of
research in every aspect of life including the health care domain. However, privacy risks and …

[HTML][HTML] Fake it till you make it: Guidelines for effective synthetic data generation

FK Dankar, M Ibrahim - Applied Sciences, 2021 - mdpi.com
Synthetic data provides a privacy protecting mechanism for the broad usage and sharing of
healthcare data for secondary purposes. It is considered a safe approach for the sharing of …

[HTML][HTML] Survey on synthetic data generation, evaluation methods and GANs

A Figueira, B Vaz - Mathematics, 2022 - mdpi.com
Synthetic data consists of artificially generated data. When data are scarce, or of poor
quality, synthetic data can be used, for example, to improve the performance of machine …

[PDF][PDF] CorGAN: correlation-capturing convolutional generative adversarial networks for generating synthetic healthcare records

A Torfi, EA Fox - The thirty-third international flairs conference, 2020 - cdn.aaai.org
Deep learning models have demonstrated high-quality performance in areas such as image
classification and speech processing. However, creating a deep learning model using …

Exploiting asymmetry for synthetic training data generation: Synthie and the case of information extraction

M Josifoski, M Sakota, M Peyrard, R West - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have great potential for synthetic data generation. This work
shows that useful data can be synthetically generated even for tasks that cannot be solved …