A survey on data augmentation for text classification

M Bayer, MA Kaufhold, C Reuter - ACM Computing Surveys, 2022 - dl.acm.org
Data augmentation, the artificial creation of training data for machine learning by
transformations, is a widely studied research field across machine learning disciplines …

A survey on contrastive self-supervised learning

A Jaiswal, AR Babu, MZ Zadeh, D Banerjee… - Technologies, 2020 - mdpi.com
Self-supervised learning has gained popularity because of its ability to avoid the cost of
annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as …

Text and code embeddings by contrastive pre-training

A Neelakantan, T Xu, R Puri, A Radford, JM Han… - arXiv preprint arXiv …, 2022 - arxiv.org
Text embeddings are useful features in many applications such as semantic search and
computing text similarity. Previous work typically trains models customized for different use …

Consert: A contrastive framework for self-supervised sentence representation transfer

Y Yan, R Li, S Wang, F Zhang, W Wu, W Xu - arXiv preprint arXiv …, 2021 - arxiv.org
Learning high-quality sentence representations benefits a wide range of natural language
processing tasks. Though BERT-based pre-trained language models achieve high …

Contrastive representation learning: A framework and review

PH Le-Khac, G Healy, AF Smeaton - Ieee Access, 2020 - ieeexplore.ieee.org
Contrastive Learning has recently received interest due to its success in self-supervised
representation learning in the computer vision domain. However, the origins of Contrastive …

Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

Supervised contrastive learning for pre-trained language model fine-tuning

B Gunel, J Du, A Conneau, V Stoyanov - arXiv preprint arXiv:2011.01403, 2020 - arxiv.org
State-of-the-art natural language understanding classification models follow two-stages: pre-
training a large language model on an auxiliary task, and then fine-tuning the model on a …

Clear: Contrastive learning for sentence representation

Z Wu, S Wang, J Gu, M Khabsa, F Sun, H Ma - arXiv preprint arXiv …, 2020 - arxiv.org
Pre-trained language models have proven their unique powers in capturing implicit
language features. However, most pre-training approaches focus on the word-level training …

Self-guided contrastive learning for BERT sentence representations

T Kim, KM Yoo, S Lee - arXiv preprint arXiv:2106.07345, 2021 - arxiv.org
Although BERT and its variants have reshaped the NLP landscape, it still remains unclear
how best to derive sentence embeddings from such pre-trained Transformers. In this work …

Coco-lm: Correcting and contrasting text sequences for language model pretraining

Y Meng, C Xiong, P Bajaj, P Bennett… - Advances in Neural …, 2021 - proceedings.neurips.cc
We present a self-supervised learning framework, COCO-LM, that pretrains Language
Models by COrrecting and COntrasting corrupted text sequences. Following ELECTRA-style …