Z Coalson,
J Woo, S Chen, Y Sun, L Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce a new class of attacks on commercial-scale (human-aligned) language
models that induce jailbreaking through targeted bitwise corruptions in model parameters …