Certified robustness against natural language attacks by causal intervention

H Zhao, C Ma, X Dong, AT Luu… - International …, 2022 - proceedings.mlr.press
Deep learning models have achieved great success in many fields, yet they are vulnerable
to adversarial examples. This paper follows a causal perspective to look into the adversarial …

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

B Wang, C Xu, S Wang, Z Gan, Y Cheng, J Gao… - arXiv preprint arXiv …, 2021 - arxiv.org
Large-scale pre-trained language models have achieved tremendous success across a
wide range of natural language understanding (NLU) tasks, even surpassing human …

Defending pre-trained language models from adversarial word substitutions without performance sacrifice

R Bao, J Wang, H Zhao - arXiv preprint arXiv:2105.14553, 2021 - arxiv.org
Pre-trained contextualized language models (PrLMs) have led to strong performance gains
in downstream natural language understanding tasks. However, PrLMs can still be easily …

Defense against adversarial attacks in nlp via dirichlet neighborhood ensemble

Y Zhou, X Zheng, CJ Hsieh, K Chang… - arXiv preprint arXiv …, 2020 - arxiv.org
Despite neural networks have achieved prominent performance on many natural language
processing (NLP) tasks, they are vulnerable to adversarial examples. In this paper, we …

Defense of word-level adversarial attacks via random substitution encoding

Z Wang, H Wang - … 13th International Conference, KSEM 2020, Hangzhou …, 2020 - Springer
The adversarial attacks against deep neural networks on computer vision tasks have
spawned many new technologies that help protect models from avoiding false predictions …

Natural language adversarial defense through synonym encoding

X Wang, J Hao, Y Yang, K He - Uncertainty in Artificial …, 2021 - proceedings.mlr.press
In the area of natural language processing, deep learning models are recently known to be
vulnerable to various types of adversarial perturbations, but relatively few works are done on …

[PDF][PDF] Defense against synonym substitution-based adversarial attacks via Dirichlet neighborhood ensemble

Y Zhou, X Zheng, CJ Hsieh, KW Chang… - Association for …, 2021 - par.nsf.gov
Although deep neural networks have achieved prominent performance on many NLP tasks,
they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble …

From hero to zéroe: A benchmark of low-level adversarial attacks

S Eger, Y Benz - Proceedings of the 1st conference of the Asia …, 2020 - aclanthology.org
Adversarial attacks are label-preserving modifications to inputs of machine learning
classifiers designed to fool machines but not humans. Natural Language Processing (NLP) …

Contextualized perturbation for textual adversarial attack

D Li, Y Zhang, H Peng, L Chen, C Brockett… - arXiv preprint arXiv …, 2020 - arxiv.org
Adversarial examples expose the vulnerabilities of natural language processing (NLP)
models, and can be used to evaluate and improve their robustness. Existing techniques of …

Better robustness by more coverage: Adversarial training with mixup augmentation for robust fine-tuning

C Si, Z Zhang, F Qi, Z Liu, Y Wang, Q Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve
the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted …