[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

I know what you trained last summer: A survey on stealing machine learning models and defences

D Oliynyk, R Mayer, A Rauber - ACM Computing Surveys, 2023 - dl.acm.org
Machine-Learning-as-a-Service (MLaaS) has become a widespread paradigm, making
even the most complex Machine Learning models available for clients via, eg, a pay-per …

Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, RGH Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

On protecting the data privacy of large language models (llms): A survey

B Yan, K Li, M Xu, Y Dong, Y Zhang, Z Ren… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are complex artificial intelligence systems capable of
understanding, generating and translating human language. They learn language patterns …

Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning

Z Wang, J Zhai, S Ma - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Deep neural networks are vulnerable to Trojan attacks. Existing attacks use visible patterns
(eg, a patch or image transformations) as triggers, which are vulnerable to human …

Towards data-free model stealing in a hard label setting

S Sanyal, S Addepalli, RV Babu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Abstract Machine learning models deployed as a service (MLaaS) are susceptible to model
stealing attacks, where an adversary attempts to steal the model within a restricted access …

Towards efficient data free black-box adversarial attack

J Zhang, B Li, J Xu, S Wu, S Ding… - Proceedings of the …, 2022 - openaccess.thecvf.com
Classic black-box adversarial attacks can take advantage of transferable adversarial
examples generated by a similar substitute model to successfully fool the target model …

Lion: Adversarial distillation of proprietary large language models

Y Jiang, C Chan, M Chen, W Wang - arXiv preprint arXiv:2305.12870, 2023 - arxiv.org
The practice of transferring knowledge from a sophisticated, proprietary large language
model (LLM) to a compact, open-source LLM has garnered considerable attention. Previous …

Explainable artificial intelligence for cybersecurity: a literature survey

F Charmet, HC Tanuwidjaja, S Ayoubi… - Annals of …, 2022 - Springer
With the extensive application of deep learning (DL) algorithms in recent years, eg, for
detecting Android malware or vulnerable source code, artificial intelligence (AI) and …

Learning to retain while acquiring: Combating distribution-shift in adversarial data-free knowledge distillation

G Patel, KR Mopuri, Q Qiu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Abstract Data-free Knowledge Distillation (DFKD) has gained popularity recently, with the
fundamental idea of carrying out knowledge transfer from a Teacher neural network to a …