Embedding-Driven Diversity Sampling to Improve Few-Shot Synthetic Data Generation

I Lopez, FN Haredasht, K Caoili, JH Chen… - arXiv preprint arXiv …, 2025 - arxiv.org
Accurate classification of clinical text often requires fine-tuning pre-trained language models,
a process that is costly and time-consuming due to the need for high-quality data and expert …