Parallel learning: Overview and perspective for computational learning across Syn2Real and Sim2Real

Q Miao, Y Lv, M Huang, X Wang… - IEEE/CAA Journal of …, 2023 - ieeexplore.ieee.org
The virtual-to-real paradigm, ie, training models on virtual data and then applying them to
solve real-world problems, has attracted more and more attention from various domains by …

A survey on data selection for language models

A Albalak, Y Elazar, SM Xie, S Longpre… - arXiv preprint arXiv …, 2024 - arxiv.org
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Nl-augmenter: A framework for task-sensitive natural language augmentation

KD Dhole, V Gangal, S Gehrmann, A Gupta, Z Li… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation is an important component in the robustness evaluation of models in
natural language processing (NLP) and in enhancing the diversity of the data they are …

A survey on gan techniques for data augmentation to address the imbalanced data issues in credit card fraud detection

E Strelcenia, S Prakoonwit - Machine Learning and Knowledge Extraction, 2023 - mdpi.com
Data augmentation is an important procedure in deep learning. GAN-based data
augmentation can be utilized in many domains. For instance, in the credit card fraud domain …

Large language models as annotators: Enhancing generalization of nlp models at minimal cost

P Bansal, A Sharma - arXiv preprint arXiv:2306.15766, 2023 - arxiv.org
State-of-the-art supervised NLP models achieve high accuracy but are also susceptible to
failures on inputs from low-data regimes, such as domains that are not represented in …

Data augmentation using llms: Data perspectives, learning paradigms and challenges

B Ding, C Qin, R Zhao, T Luo, X Li, G Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
In the rapidly evolving field of machine learning (ML), data augmentation (DA) has emerged
as a pivotal technique for enhancing model performance by diversifying training examples …

Docogen: Domain counterfactual generation for low resource domain adaptation

N Calderon, E Ben-David, A Feder… - arXiv preprint arXiv …, 2022 - arxiv.org
Natural language processing (NLP) algorithms have become very successful, but they still
struggle when applied to out-of-distribution examples. In this paper we propose a …

Alp: Data augmentation using lexicalized pcfgs for few-shot text classification

HH Kim, D Woo, SJ Oh, JW Cha, YS Han - Proceedings of the aaai …, 2022 - ojs.aaai.org
Data augmentation has been an important ingredient for boosting performances of learned
models. Prior data augmentation methods for few-shot text classification have led to great …

To augment or not to augment? A comparative study on text augmentation techniques for low-resource NLP

GG Şahin - Computational Linguistics, 2022 - direct.mit.edu
Data-hungry deep neural networks have established themselves as the de facto standard for
many NLP tasks, including the traditional sequence tagging ones. Despite their state-of-the …

Text autoaugment: Learning compositional augmentation policy for text classification

S Ren, J Zhang, L Li, X Sun, J Zhou - arXiv preprint arXiv:2109.00523, 2021 - arxiv.org
Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-
resource or class-imbalanced situations. Traditional methods first devise task-specific …