Dream the impossible: Outlier imagination with diffusion models

X Du, Y Sun, J Zhu, Y Li - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Utilizing auxiliary outlier datasets to regularize the machine learning model has
demonstrated promise for out-of-distribution (OOD) detection and safe prediction. Due to the …

Self-consuming generative models go mad

S Alemohammad, J Casco-Rodriguez, L Luzi… - arXiv preprint arXiv …, 2023 - arxiv.org
Seismic advances in generative AI algorithms for imagery, text, and other data types has led
to the temptation to use synthetic data to train next-generation models. Repeating this …

Hoidiffusion: Generating realistic 3d hand-object interaction data

M Zhang, Y Fu, Z Ding, S Liu, Z Tu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract 3D hand-object interaction data is scarce due to the hardware constraints in scaling
up the data collection process. In this paper we propose HOIDiffusion for generating realistic …

Ai-generated images as data source: The dawn of synthetic era

Z Yang, F Zhan, K Liu, M Xu, S Lu - arXiv preprint arXiv:2310.01830, 2023 - arxiv.org
The advancement of visual intelligence is intrinsically tethered to the availability of data. In
parallel, generative Artificial Intelligence (AI) has unlocked the potential to create synthetic …

Genview: Enhancing view quality with pretrained generative model for self-supervised learning

X Li, Y Yang, X Li, J Wu, Y Yu, B Ghanem… - European Conference on …, 2025 - Springer
Self-supervised learning has achieved remarkable success in acquiring high-quality
representations from unlabeled data. The widely adopted contrastive learning framework …

Auditing and generating synthetic data with controllable trust trade-offs

B Belgodere, P Dognin, A Ivankay… - IEEE Journal on …, 2024 - ieeexplore.ieee.org
Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have
emerged to address these issues by enabling a paradigm that relies on generative AI …

Domain Gap Embeddings for Generative Dataset Augmentation

YO Wang, Y Chung, CH Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
The performance of deep learning models is intrinsically tied to the quality volume and
relevance of their training data. Gathering ample data for production scenarios often …

Synthetic data as validation

Q Hu, A Yuille, Z Zhou - arXiv preprint arXiv:2310.16052, 2023 - arxiv.org
This study leverages synthetic data as a validation set to reduce overfitting and ease the
selection of the best model in AI development. While synthetic data have been used for …

From categories to classifier: Name-only continual learning by exploring the web

A Prabhu, HAAK Hammoud, SN Lim, B Ghanem… - arXiv preprint arXiv …, 2023 - arxiv.org
Continual Learning (CL) often relies on the availability of extensive annotated datasets, an
assumption that is unrealistically time-consuming and costly in practice. We explore a novel …

Towards Theoretical Understandings of Self-Consuming Generative Models

S Fu, S Zhang, Y Wang, X Tian, D Tao - arXiv preprint arXiv:2402.11778, 2024 - arxiv.org
This paper tackles the emerging challenge of training generative models within a self-
consuming loop, wherein successive generations of models are recursively trained on …