An overview of catastrophic ai risks

D Hendrycks, M Mazeika, T Woodside - arXiv preprint arXiv:2306.12001, 2023 - arxiv.org
Rapid advancements in artificial intelligence (AI) have sparked growing concerns among
experts, policymakers, and world leaders regarding the potential for increasingly advanced …

Wild patterns reloaded: A survey of machine learning security against training data poisoning

AE Cinà, K Grosse, A Demontis, S Vascon… - ACM Computing …, 2023 - dl.acm.org
The success of machine learning is fueled by the increasing availability of computing power
and large training datasets. The training data is used to learn new models or update existing …

Unsolved problems in ml safety

D Hendrycks, N Carlini, J Schulman… - arXiv preprint arXiv …, 2021 - arxiv.org
Machine learning (ML) systems are rapidly increasing in size, are acquiring new
capabilities, and are increasingly deployed in high-stakes settings. As with other powerful …

Trojdiff: Trojan attacks on diffusion models with diverse targets

W Chen, D Song, B Li - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Diffusion models have achieved great success in a range of tasks, such as image synthesis
and molecule design. As such successes hinge on large-scale training data collected from …

Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning

J Jia, Y Liu, NZ Gong - 2022 IEEE Symposium on Security and …, 2022 - ieeexplore.ieee.org
Self-supervised learning in computer vision aims to pre-train an image encoder using a
large amount of unlabeled images or (image, text) pairs. The pre-trained image encoder can …

Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses

M Goldblum, D Tsipras, C Xie, X Chen… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
As machine learning systems grow in scale, so do their training data requirements, forcing
practitioners to automate and outsource the curation of training data in order to achieve state …

Hidden killer: Invisible textual backdoor attacks with syntactic trigger

F Qi, M Li, Y Chen, Z Zhang, Z Liu, Y Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
Backdoor attacks are a kind of insidious security threat against machine learning models.
After being injected with a backdoor in training, the victim model will produce adversary …

Badnl: Backdoor attacks against nlp models with semantic-preserving improvements

X Chen, A Salem, D Chen, M Backes, S Ma… - Proceedings of the 37th …, 2021 - dl.acm.org
Deep neural networks (DNNs) have progressed rapidly during the past decade and have
been deployed in various real-world applications. Meanwhile, DNN models have been …

Onion: A simple and effective defense against textual backdoor attacks

F Qi, Y Chen, M Li, Y Yao, Z Liu, M Sun - arXiv preprint arXiv:2011.10369, 2020 - arxiv.org
Backdoor attacks are a kind of emergent training-time threat to deep neural networks
(DNNs). They can manipulate the output of DNNs and possess high insidiousness. In the …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …