Improving the reliability of deep neural networks in NLP: A review

B Alshemali, J Kalita - Knowledge-Based Systems, 2020 - Elsevier
Deep learning models have achieved great success in solving a variety of natural language
processing (NLP) problems. An ever-growing body of research, however, illustrates the …

The future of false information detection on social media: New perspectives and trends

B Guo, Y Ding, L Yao, Y Liang, Z Yu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
The massive spread of false information on social media has become a global risk, implicitly
influencing public opinion and threatening social/political development. False information …

Red teaming language models with language models

E Perez, S Huang, F Song, T Cai, R Ring… - arXiv preprint arXiv …, 2022 - arxiv.org
Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …

Universal adversarial triggers for attacking and analyzing NLP

E Wallace, S Feng, N Kandpal, M Gardner… - arXiv preprint arXiv …, 2019 - arxiv.org
Adversarial examples highlight model vulnerabilities and are useful for evaluation and
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …

Red teaming language model detectors with language models

Z Shi, Y Wang, F Yin, X Chen, KW Chang… - Transactions of the …, 2024 - direct.mit.edu
The prevalence and strong capability of large language models (LLMs) present significant
safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive …

Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning

Z Zhou, S Hu, M Li, H Zhang, Y Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org
Multimodal contrastive learning aims to train a general-purpose feature extractor, such as
CLIP, on vast amounts of raw, unlabeled paired image-text data. This can greatly benefit …

Adversarial attack and defense technologies in natural language processing: A survey

S Qiu, Q Liu, S Zhou, W Huang - Neurocomputing, 2022 - Elsevier
Recently, the adversarial attack and defense technology has made remarkable
achievements and has been widely applied in the computer vision field, promoting its rapid …

A survey on universal adversarial attack

C Zhang, P Benz, C Lin, A Karjauv, J Wu… - arXiv preprint arXiv …, 2021 - arxiv.org
The intriguing phenomenon of adversarial examples has attracted significant attention in
machine learning and what might be more surprising to the community is the existence of …

{T-Miner}: A generative approach to defend against trojan attacks on {DNN-based} text classification

A Azizi, IA Tahmid, A Waheed, N Mangaokar… - 30th USENIX Security …, 2021 - usenix.org
Deep Neural Network (DNN) classifiers are known to be vulnerable to Trojan or backdoor
attacks, where the classifier is manipulated such that it misclassifies any input containing an …

Adversarial threats to deepfake detection: A practical perspective

P Neekhara, B Dolhansky, J Bitton… - Proceedings of the …, 2021 - openaccess.thecvf.com
Facially manipulated images and videos or DeepFakes can be used maliciously to fuel
misinformation or defame individuals. Therefore, detecting DeepFakes is crucial to increase …