Analysis methods in neural language processing: A survey

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu
The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

A survey of adversarial defenses and robustness in nlp

S Goyal, S Doddapaneni, MM Khapra… - ACM Computing …, 2023 - dl.acm.org
In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …

Scaling laws for reward model overoptimization

L Gao, J Schulman, J Hilton - International Conference on …, 2023 - proceedings.mlr.press
In reinforcement learning from human feedback, it is common to optimize against a reward
model trained to predict human preferences. Because the reward model is an imperfect …

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Universal adversarial triggers for attacking and analyzing NLP

E Wallace, S Feng, N Kandpal, M Gardner… - arXiv preprint arXiv …, 2019 - arxiv.org
Adversarial examples highlight model vulnerabilities and are useful for evaluation and
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …

Weight poisoning attacks on pre-trained models

K Kurita, P Michel, G Neubig - arXiv preprint arXiv:2004.06660, 2020 - arxiv.org
Recently, NLP has seen a surge in the usage of large pre-trained models. Users download
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …

Adversarial attacks on deep-learning models in natural language processing: A survey

WE Zhang, QZ Sheng, A Alhazmi, C Li - ACM Transactions on Intelligent …, 2020 - dl.acm.org
With the development of high computational devices, deep neural networks (DNNs), in
recent years, have gained significant popularity in many Artificial Intelligence (AI) …

Natural attack for pre-trained models of code

Z Yang, J Shi, J He, D Lo - … of the 44th International Conference on …, 2022 - dl.acm.org
Pre-trained models of code have achieved success in many important software engineering
tasks. However, these powerful models are vulnerable to adversarial attacks that slightly …

Review of artificial intelligence adversarial attack and defense technologies

S Qiu, Q Liu, S Zhou, C Wu - Applied Sciences, 2019 - mdpi.com
In recent years, artificial intelligence technologies have been widely used in computer
vision, natural language processing, automatic driving, and other fields. However, artificial …

An empirical survey of data augmentation for limited data learning in nlp

J Chen, D Tam, C Raffel, M Bansal… - Transactions of the …, 2023 - direct.mit.edu
NLP has achieved great progress in the past decade through the use of neural models and
large labeled datasets. The dependence on abundant data prevents NLP models from being …