Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, JZ Kolter, M Fredrikson - arXiv preprint arXiv:2307.15043, 2023 - arxiv.org
Because" out-of-the-box" large language models are capable of generating a great deal of
objectionable content, recent work has focused on aligning these models in an attempt to …

Visual adversarial examples jailbreak aligned large language models

X Qi, K Huang, A Panda, P Henderson… - Proceedings of the …, 2024 - ojs.aaai.org
Warning: this paper contains data, prompts, and model outputs that are offensive in nature.
Recently, there has been a surge of interest in integrating vision into Large Language …

Invisible for both camera and lidar: Security of multi-sensor fusion based perception in autonomous driving under physical-world attacks

Y Cao, N Wang, C Xiao, D Yang, J Fang… - … IEEE symposium on …, 2021 - ieeexplore.ieee.org
In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite
various prior studies on its security issues, all of them only consider attacks on camera-or …

Rab: Provable robustness against backdoor attacks

M Weber, X Xu, B Karlaš, C Zhang… - 2023 IEEE Symposium …, 2023 - ieeexplore.ieee.org
Recent studies have shown that deep neural net-works (DNNs) are vulnerable to
adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense …

Lot: Layer-wise orthogonal training on improving l2 certified robustness

X Xu, L Li, B Li - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
Recent studies show that training deep neural networks (DNNs) with Lipschitz constraints
are able to enhance adversarial robustness and other model properties such as stability. In …

Visual adversarial examples jailbreak large language models

X Qi, K Huang, A Panda, M Wang, P Mittal - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, there has been a surge of interest in introducing vision into Large Language
Models (LLMs). The proliferation of large Visual Language Models (VLMs), such as …

Smoothmix: Training confidence-calibrated smoothed classifiers for certified robustness

J Jeong, S Park, M Kim, HC Lee… - Advances in Neural …, 2021 - proceedings.neurips.cc
Randomized smoothing is currently a state-of-the-art method to construct a certifiably robust
classifier from neural networks against $\ell_2 $-adversarial perturbations. Under the …

{DiffSmooth}: Certifiably robust learning via diffusion models and local smoothing

J Zhang, Z Chen, H Zhang, C Xiao, B Li - 32nd USENIX Security …, 2023 - usenix.org
Diffusion models have been leveraged to perform adversarial purification and thus provide
both empirical and certified robustness for a standard model. On the other hand, different …

Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness

Z Yang, L Li, X Xu, S Zuo, Q Chen… - Advances in …, 2021 - proceedings.neurips.cc
Adversarial Transferability is an intriguing property-adversarial perturbation crafted against
one model is also effective against another model, while these models are from different …

Crop: Certifying robust policies for reinforcement learning through functional smoothing

F Wu, L Li, Z Huang, Y Vorobeychik, D Zhao… - arXiv preprint arXiv …, 2021 - arxiv.org
As reinforcement learning (RL) has achieved great success and been even adopted in
safety-critical domains such as autonomous vehicles, a range of empirical studies have …