Backdoor attacks and countermeasures on deep learning: A comprehensive review

Y Gao, BG Doan, Z Zhang, S Ma, J Zhang, A Fu… - arXiv preprint arXiv …, 2020 - arxiv.org
This work provides the community with a timely comprehensive review of backdoor attacks
and countermeasures on deep learning. According to the attacker's capability and affected …

Unsolved problems in ml safety

D Hendrycks, N Carlini, J Schulman… - arXiv preprint arXiv …, 2021 - arxiv.org
Machine learning (ML) systems are rapidly increasing in size, are acquiring new
capabilities, and are increasingly deployed in high-stakes settings. As with other powerful …

Backdoorbench: A comprehensive benchmark of backdoor learning

B Wu, H Chen, M Zhang, Z Zhu, S Wei… - Advances in …, 2022 - proceedings.neurips.cc
Backdoor learning is an emerging and vital topic for studying deep neural networks'
vulnerability (DNNs). Many pioneering backdoor attack and defense methods are being …

Backdoor learning for nlp: Recent advances, challenges, and future research directions

M Omar - arXiv preprint arXiv:2302.06801, 2023 - arxiv.org
Although backdoor learning is an active research topic in the NLP domain, the literature
lacks studies that systematically categorize and summarize backdoor attacks and defenses …

Sleeper agents: Training deceptive llms that persist through safety training

E Hubinger, C Denison, J Mu, M Lambert… - arXiv preprint arXiv …, 2024 - arxiv.org
Humans are capable of strategically deceptive behavior: behaving helpfully in most
situations, but then behaving very differently in order to pursue alternative objectives when …

Training with more confidence: Mitigating injected and natural backdoors during training

Z Wang, H Ding, J Zhai, S Ma - Advances in Neural …, 2022 - proceedings.neurips.cc
The backdoor or Trojan attack is a severe threat to deep neural networks (DNNs).
Researchers find that DNNs trained on benign data and settings can also learn backdoor …

Dual-key multimodal backdoors for visual question answering

M Walmer, K Sikka, I Sur… - Proceedings of the …, 2022 - openaccess.thecvf.com
The success of deep learning has enabled advances in multimodal tasks that require non-
trivial fusion of multiple input domains. Although multimodal models have shown potential in …

Provable defense against backdoor policies in reinforcement learning

S Bharti, X Zhang, A Singla… - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose a provable defense mechanism against backdoor policies in reinforcement
learning under subspace trigger assumption. A backdoor policy is a security threat where an …

Trojan signatures in DNN weights

G Fields, M Samragh, M Javaheripi… - Proceedings of the …, 2021 - openaccess.thecvf.com
Deep neural networks have been shown to be vulnerable to backdoor, or Trojan, attacks
where an adversary has embedded a trigger in the network at training time such that the …

Trojan attack and defense for deep learning based navigation systems of unmanned aerial vehicles

M Mynuddin, SU Khan, R Ahmari, L Landivar… - IEEE …, 2024 - ieeexplore.ieee.org
As unmanned aerial vehicles (UAVs) become increasingly integrated across various
domains, both military and civilian, safeguarding the security of their navigation systems …