A survey of adversarial defenses and robustness in nlp

S Goyal, S Doddapaneni, MM Khapra… - ACM Computing …, 2023 - dl.acm.org
In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …

Exploiting programmatic behavior of llms: Dual-use through standard security attacks

D Kang, X Li, I Stoica, C Guestrin… - 2024 IEEE Security …, 2024 - ieeexplore.ieee.org
Recent advances in instruction-following large language models (LLMs) have led to
dramatic improvements in a range of NLP tasks. Unfortunately, we find that the same …

Contextualized perturbation for textual adversarial attack

D Li, Y Zhang, H Peng, L Chen, C Brockett… - arXiv preprint arXiv …, 2020 - arxiv.org
Adversarial examples expose the vulnerabilities of natural language processing (NLP)
models, and can be used to evaluate and improve their robustness. Existing techniques of …

Adversarial attack and defense technologies in natural language processing: A survey

S Qiu, Q Liu, S Zhou, W Huang - Neurocomputing, 2022 - Elsevier
Recently, the adversarial attack and defense technology has made remarkable
achievements and has been widely applied in the computer vision field, promoting its rapid …

Evaluating the robustness of neural language models to input perturbations

M Moradi, M Samwald - arXiv preprint arXiv:2108.12237, 2021 - arxiv.org
High-performance neural language models have obtained state-of-the-art results on a wide
range of Natural Language Processing (NLP) tasks. However, results for common …

Better robustness by more coverage: Adversarial training with mixup augmentation for robust fine-tuning

C Si, Z Zhang, F Qi, Z Liu, Y Wang, Q Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve
the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted …

Towards a robust deep neural network against adversarial texts: A survey

W Wang, R Wang, L Wang, Z Wang… - ieee transactions on …, 2021 - ieeexplore.ieee.org
Deep neural networks (DNNs) have achieved remarkable success in various tasks (eg,
image classification, speech recognition, and natural language processing (NLP)). However …

Text adversarial attacks and defenses: Issues, taxonomy, and perspectives

X Han, Y Zhang, W Wang… - Security and …, 2022 - Wiley Online Library
Deep neural networks (DNNs) have been widely used in many fields due to their powerful
representation learning capabilities. However, they are exposed to serious threats caused …

Why should adversarial perturbations be imperceptible? rethink the research paradigm in adversarial nlp

Y Chen, H Gao, G Cui, F Qi, L Huang, Z Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Textual adversarial samples play important roles in multiple subfields of NLP research,
including security, evaluation, explainability, and data augmentation. However, most work …

Detection of word adversarial examples in text classification: Benchmark and baseline via robust density estimation

KY Yoo, J Kim, J Jang, N Kwak - arXiv preprint arXiv:2203.01677, 2022 - arxiv.org
Word-level adversarial attacks have shown success in NLP models, drastically decreasing
the performance of transformer-based models in recent years. As a countermeasure …