Better robustness by more coverage: Adversarial training with mixup augmentation for robust fine-tuning

C Si, Z Zhang, F Qi, Z Liu, Y Wang, Q Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve
the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted …

Generating fluent adversarial examples for natural languages

H Zhang, H Zhou, N Miao, L Li - arXiv preprint arXiv:2007.06174, 2020 - arxiv.org
Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a
real challenge. Firstly, as the sentence space is discrete, it is difficult to make small …

Flooding-X: Improving BERT's resistance to adversarial attacks via loss-restricted fine-tuning

Q Liu, R Zheng, B Rong, J Liu, Z Liu… - Proceedings of the …, 2022 - aclanthology.org
Adversarial robustness has attracted much attention recently, and the mainstream solution is
adversarial training. However, the tradition of generating adversarial perturbations for each …

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

B Wang, C Xu, S Wang, Z Gan, Y Cheng, J Gao… - arXiv preprint arXiv …, 2021 - arxiv.org
Large-scale pre-trained language models have achieved tremendous success across a
wide range of natural language understanding (NLU) tasks, even surpassing human …

Bridge the gap between cv and nlp! a gradient-based textual adversarial attack framework

L Yuan, Y Zhang, Y Chen, W Wei - arXiv preprint arXiv:2110.15317, 2021 - arxiv.org
Despite recent success on various tasks, deep learning techniques still perform poorly on
adversarial examples with small perturbations. While optimization-based methods for …

Defending pre-trained language models from adversarial word substitutions without performance sacrifice

R Bao, J Wang, H Zhao - arXiv preprint arXiv:2105.14553, 2021 - arxiv.org
Pre-trained contextualized language models (PrLMs) have led to strong performance gains
in downstream natural language understanding tasks. However, PrLMs can still be easily …

Multi-granularity textual adversarial attack with behavior cloning

Y Chen, J Su, W Wei - arXiv preprint arXiv:2109.04367, 2021 - arxiv.org
Recently, the textual adversarial attack models become increasingly popular due to their
successful in estimating the robustness of NLP models. However, existing works have …

SemAttack: Natural textual attacks via different semantic spaces

B Wang, C Xu, X Liu, Y Cheng, B Li - arXiv preprint arXiv:2205.01287, 2022 - arxiv.org
Recent studies show that pre-trained language models (LMs) are vulnerable to textual
adversarial attacks. However, existing attack methods either suffer from low attack success …

Contextualized perturbation for textual adversarial attack

D Li, Y Zhang, H Peng, L Chen, C Brockett… - arXiv preprint arXiv …, 2020 - arxiv.org
Adversarial examples expose the vulnerabilities of natural language processing (NLP)
models, and can be used to evaluate and improve their robustness. Existing techniques of …

Defense against adversarial attacks in nlp via dirichlet neighborhood ensemble

Y Zhou, X Zheng, CJ Hsieh, K Chang… - arXiv preprint arXiv …, 2020 - arxiv.org
Despite neural networks have achieved prominent performance on many natural language
processing (NLP) tasks, they are vulnerable to adversarial examples. In this paper, we …