相关文章- 学术资源搜索

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

B Wang, C Xu, S Wang, Z Gan, Y Cheng, J Gao… - arXiv preprint arXiv …, 2021 - arxiv.org

Large-scale pre-trained language models have achieved tremendous success across a
wide range of natural language understanding (NLU) tasks, even surpassing human …

被引用次数：161 相关文章所有 6 个版本

[PDF] arxiv.org

Defending pre-trained language models from adversarial word substitutions without performance sacrifice

R Bao, J Wang, H Zhao - arXiv preprint arXiv:2105.14553, 2021 - arxiv.org

Pre-trained contextualized language models (PrLMs) have led to strong performance gains
in downstream natural language understanding tasks. However, PrLMs can still be easily …

被引用次数：38 相关文章所有 3 个版本

[PDF] arxiv.org

Better robustness by more coverage: Adversarial training with mixup augmentation for robust fine-tuning

C Si, Z Zhang, F Qi, Z Liu, Y Wang, Q Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve
the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted …

被引用次数：91 相关文章所有 4 个版本

[PDF] arxiv.org

Infobert: Improving robustness of language models from an information theoretic perspective

B Wang, S Wang, Y Cheng, Z Gan, R Jia, B Li… - arXiv preprint arXiv …, 2020 - arxiv.org

Large-scale language models such as BERT have achieved state-of-the-art performance
across a wide range of NLP tasks. Recent studies, however, show that such BERT-based …

被引用次数：116 相关文章所有 7 个版本

[PDF] arxiv.org

A LLM assisted exploitation of AI-Guardian

N Carlini - arXiv preprint arXiv:2307.15008, 2023 - arxiv.org

Large language models (LLMs) are now highly capable at a diverse range of tasks. This
paper studies whether or not GPT-4, one such LLM, is capable of assisting researchers in …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Bridge the gap between cv and nlp! a gradient-based textual adversarial attack framework

L Yuan, Y Zhang, Y Chen, W Wei - arXiv preprint arXiv:2110.15317, 2021 - arxiv.org

Despite recent success on various tasks, deep learning techniques still perform poorly on
adversarial examples with small perturbations. While optimization-based methods for …

被引用次数：31 相关文章所有 3 个版本

[PDF] mlr.press

Certified robustness against natural language attacks by causal intervention

H Zhao, C Ma, X Dong, AT Luu… - International …, 2022 - proceedings.mlr.press

Deep learning models have achieved great success in many fields, yet they are vulnerable
to adversarial examples. This paper follows a causal perspective to look into the adversarial …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Contextualized perturbation for textual adversarial attack

D Li, Y Zhang, H Peng, L Chen, C Brockett… - arXiv preprint arXiv …, 2020 - arxiv.org

Adversarial examples expose the vulnerabilities of natural language processing (NLP)
models, and can be used to evaluate and improve their robustness. Existing techniques of …

被引用次数：217 相关文章所有 5 个版本

[PDF] arxiv.org

Defense against adversarial attacks in nlp via dirichlet neighborhood ensemble

Y Zhou, X Zheng, CJ Hsieh, K Chang… - arXiv preprint arXiv …, 2020 - arxiv.org

Despite neural networks have achieved prominent performance on many natural language
processing (NLP) tasks, they are vulnerable to adversarial examples. In this paper, we …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

Baseline defenses for adversarial attacks against aligned language models

N Jain, A Schwarzschild, Y Wen, G Somepalli… - arXiv preprint arXiv …, 2023 - arxiv.org

As Large Language Models quickly become ubiquitous, their security vulnerabilities are
critical to understand. Recent work shows that text optimizers can produce jailbreaking …

被引用次数：40 相关文章所有 3 个版本

高级搜索

QQ 群

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

Defending pre-trained language models from adversarial word substitutions without performance sacrifice

Better robustness by more coverage: Adversarial training with mixup augmentation for robust fine-tuning

Infobert: Improving robustness of language models from an information theoretic perspective

A LLM assisted exploitation of AI-Guardian

Bridge the gap between cv and nlp! a gradient-based textual adversarial attack framework

Certified robustness against natural language attacks by causal intervention

Contextualized perturbation for textual adversarial attack

Defense against adversarial attacks in nlp via dirichlet neighborhood ensemble

Baseline defenses for adversarial attacks against aligned language models

引用