Effective backdoor defense by exploiting sensitivity of poisoned samples

W Chen, B Wu, H Wang - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Poisoning-based backdoor attacks are serious threat for training deep models on data from
untrustworthy sources. Given a backdoored model, we observe that the feature …

Backdoor defense via adaptively splitting poisoned dataset

K Gao, Y Bai, J Gu, Y Yang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Backdoor defenses have been studied to alleviate the threat of deep neural networks
(DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt …

Neural polarizer: A lightweight and effective backdoor defense via purifying poisoned features

M Zhu, S Wei, H Zha, B Wu - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent studies have demonstrated the susceptibility of deep neural networks to backdoor
attacks. Given a backdoored model, its prediction of a poisoned sample with trigger will be …

Shared adversarial unlearning: Backdoor mitigation by unlearning shared adversarial examples

S Wei, M Zhang, H Zha, B Wu - Advances in Neural …, 2023 - proceedings.neurips.cc
Backdoor attacks are serious security threats to machine learning models where an
adversary can inject poisoned samples into the training set, causing a backdoored model …

A unified evaluation of textual backdoor learning: Frameworks and benchmarks

G Cui, L Yuan, B He, Y Chen… - Advances in Neural …, 2022 - proceedings.neurips.cc
Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a
backdoor in the training phase, the adversary could control model predictions via predefined …

Black-box access is insufficient for rigorous ai audits

S Casper, C Ezell, C Siegmann, N Kolt… - The 2024 ACM …, 2024 - dl.acm.org
External audits of AI systems are increasingly recognized as a key mechanism for AI
governance. The effectiveness of an audit, however, depends on the degree of access …

Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning

S Liang, M Zhu, A Liu, B Wu, X Cao… - Proceedings of the …, 2024 - openaccess.thecvf.com
While existing backdoor attacks have successfully infected multimodal contrastive learning
models such as CLIP they can be easily countered by specialized backdoor defenses for …

Enhancing fine-tuning based backdoor defense with sharpness-aware minimization

M Zhu, S Wei, L Shen, Y Fan… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Backdoor defense, which aims to detect or mitigate the effect of malicious triggers introduced
by attackers, is becoming increasingly critical for machine learning security and integrity …

Detecting backdoors during the inference stage based on corruption robustness consistency

X Liu, M Li, H Wang, S Hu, D Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com
Deep neural networks are proven to be vulnerable to backdoor attacks. Detecting the trigger
samples during the inference stage, ie, the test-time trigger sample detection, can prevent …

Badgpt: Exploring security vulnerabilities of chatgpt via backdoor attacks to instructgpt

J Shi, Y Liu, P Zhou, L Sun - arXiv preprint arXiv:2304.12298, 2023 - arxiv.org
Recently, ChatGPT has gained significant attention in research due to its ability to interact
with humans effectively. The core idea behind this model is reinforcement learning (RL) fine …