Natural language adversarial attack and defense in word level

X Wang, H Jin, K He - 2019 - openreview.net
X Wang, H Jin, K He
2019openreview.net
Up until very recently, inspired by a mass of researches on adversarial examples for
computer vision, there has been a growing interest in designing adversarial attacks for
Natural Language Processing (NLP) tasks, followed by very few works of adversarial
defenses for NLP. To our knowledge, there exists no defense method against the successful
synonym substitution based attacks that aim to satisfy all the lexical, grammatical, semantic
constraints and thus are hard to perceived by humans. We contribute to fill this gap and …
Up until very recently, inspired by a mass of researches on adversarial examples for computer vision, there has been a growing interest in designing adversarial attacks for Natural Language Processing (NLP) tasks, followed by very few works of adversarial defenses for NLP. To our knowledge, there exists no defense method against the successful synonym substitution based attacks that aim to satisfy all the lexical, grammatical, semantic constraints and thus are hard to perceived by humans. We contribute to fill this gap and propose a novel adversarial defense method called Synonym Encoding Method (SEM), which inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations. Extensive experiments demonstrate that SEM can efficiently defend current best synonym substitution based adversarial attacks with little decay on the accuracy for benign examples. To better evaluate SEM, we also design a strong attack method called Improved Genetic Algorithm (IGA) that adopts the genetic metaheuristic for synonym substitution based attacks. Compared with existing genetic based adversarial attack, IGA can achieve higher attack success rate while maintaining the transferability of the adversarial examples.
openreview.net
以上显示的是最相近的搜索结果。 查看全部搜索结果