Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arXiv preprint arXiv …, 2023 - arxiv.org
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

Invisible for both camera and lidar: Security of multi-sensor fusion based perception in autonomous driving under physical-world attacks

Y Cao, N Wang, C Xiao, D Yang, J Fang… - … IEEE symposium on …, 2021 - ieeexplore.ieee.org
In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite
various prior studies on its security issues, all of them only consider attacks on camera-or …

Rethinking lipschitz neural networks and certified robustness: A boolean function perspective

B Zhang, D Jiang, D He… - Advances in neural …, 2022 - proceedings.neurips.cc
Designing neural networks with bounded Lipschitz constant is a promising way to obtain
certifiably robust classifiers against adversarial examples. However, the relevant progress …

Sok: Certified robustness for deep neural networks

L Li, T Xie, B Li - 2023 IEEE symposium on security and privacy …, 2023 - ieeexplore.ieee.org
Great advances in deep neural networks (DNNs) have led to state-of-the-art performance on
a wide range of tasks. However, recent studies have shown that DNNs are vulnerable to …

Certified training: Small boxes are all you need

MN Müller, F Eckert, M Fischer, M Vechev - arXiv preprint arXiv …, 2022 - arxiv.org
To obtain, deterministic guarantees of adversarial robustness, specialized training methods
are used. We propose, SABR, a novel such certified training method, based on the key …

Deep partition aggregation: Provable defense against general poisoning attacks

A Levine, S Feizi - arXiv preprint arXiv:2006.14768, 2020 - arxiv.org
Adversarial poisoning attacks distort training data in order to corrupt the test-time behavior of
a classifier. A provable defense provides a certificate for each test sample, which is a lower …

Rab: Provable robustness against backdoor attacks

M Weber, X Xu, B Karlaš, C Zhang… - 2023 IEEE Symposium …, 2023 - ieeexplore.ieee.org
Recent studies have shown that deep neural net-works (DNNs) are vulnerable to
adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense …

Certified patch robustness via smoothed vision transformers

H Salman, S Jain, E Wong… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Certified patch defenses can guarantee robustness of an image classifier to arbitrary
changes within a bounded contiguous region. But, currently, this robustness comes at a cost …

Smoothmix: Training confidence-calibrated smoothed classifiers for certified robustness

J Jeong, S Park, M Kim, HC Lee… - Advances in Neural …, 2021 - proceedings.neurips.cc
Randomized smoothing is currently a state-of-the-art method to construct a certifiably robust
classifier from neural networks against $\ell_2 $-adversarial perturbations. Under the …

Prompt certified machine unlearning with randomized gradient smoothing and quantization

Z Zhang, Y Zhou, X Zhao, T Che… - Advances in Neural …, 2022 - proceedings.neurips.cc
The right to be forgotten calls for efficient machine unlearning techniques that make trained
machine learning models forget a cohort of data. The combination of training and unlearning …