Defending pre-trained language models from adversarial word substitutions without performance sacrifice

R Bao, J Wang, H Zhao - arXiv preprint arXiv:2105.14553, 2021 - arxiv.org
Pre-trained contextualized language models (PrLMs) have led to strong performance gains
in downstream natural language understanding tasks. However, PrLMs can still be easily …

Rethinking textual adversarial defense for pre-trained language models

J Wang, R Bao, Z Zhang, H Zhao - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Although pre-trained language models (PrLMs) have achieved significant success, recent
studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating …

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

B Wang, C Xu, S Wang, Z Gan, Y Cheng, J Gao… - arXiv preprint arXiv …, 2021 - arxiv.org
Large-scale pre-trained language models have achieved tremendous success across a
wide range of natural language understanding (NLU) tasks, even surpassing human …

Defense against adversarial attacks in nlp via dirichlet neighborhood ensemble

Y Zhou, X Zheng, CJ Hsieh, K Chang… - arXiv preprint arXiv …, 2020 - arxiv.org
Despite neural networks have achieved prominent performance on many natural language
processing (NLP) tasks, they are vulnerable to adversarial examples. In this paper, we …

Better robustness by more coverage: Adversarial training with mixup augmentation for robust fine-tuning

C Si, Z Zhang, F Qi, Z Liu, Y Wang, Q Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve
the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted …

Rmlm: A flexible defense framework for proactively mitigating word-level adversarial attacks

Z Wang, Z Liu, X Zheng, Q Su… - Proceedings of the 61st …, 2023 - aclanthology.org
Adversarial attacks on deep neural networks keep raising security concerns in natural
language processing research. Existing defenses focus on improving the robustness of the …

[PDF][PDF] Towards Semantics-and Domain-Aware Adversarial Attacks.

J Zhang, YC Huang, W Wu, MR Lyu - IJCAI, 2023 - ijcai.org
Abstract Language models are known to be vulnerable to textual adversarial attacks, which
add humanimperceptible perturbations to the input to mislead DNNs. It is thus imperative to …

Phrase-level textual adversarial attack with label preservation

Y Lei, Y Cao, D Li, T Zhou, M Fang… - arXiv preprint arXiv …, 2022 - arxiv.org
Generating high-quality textual adversarial examples is critical for investigating the pitfalls of
natural language processing (NLP) models and further promoting their robustness. Existing …

Searching for an effective defender: Benchmarking defense against adversarial word substitution

Z Li, J Xu, J Zeng, L Li, X Zheng, Q Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
Recent studies have shown that deep neural networks are vulnerable to intentionally crafted
adversarial examples, and various methods have been proposed to defend against …

[PDF][PDF] Defense against synonym substitution-based adversarial attacks via Dirichlet neighborhood ensemble

Y Zhou, X Zheng, CJ Hsieh, KW Chang… - Association for …, 2021 - par.nsf.gov
Although deep neural networks have achieved prominent performance on many NLP tasks,
they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble …