Threats to pre-trained language models: Survey and taxonomy

S Guo, C Xie, J Li, L Lyu, T Zhang - arXiv preprint arXiv:2202.06862, 2022 - arxiv.org
Pre-trained language models (PTLMs) have achieved great success and remarkable
performance over a wide range of natural language processing (NLP) tasks. However, there …

Stealing part of a production language model

N Carlini, D Paleka, KD Dvijotham, T Steinke… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce the first model-stealing attack that extracts precise, nontrivial information from
black-box production language models like OpenAI's ChatGPT or Google's PaLM-2 …

Fingerprinting deep neural networks globally via universal adversarial perturbations

Z Peng, S Li, G Chen, C Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we propose a novel and practical mechanism which enables the service
provider to verify whether a suspect model is stolen from the victim model via model …

Understanding causality with large language models: Feasibility and opportunities

C Zhang, S Bauer, P Bennett, J Gao, W Gong… - arXiv preprint arXiv …, 2023 - arxiv.org
We assess the ability of large language models (LLMs) to answer causal questions by
analyzing their strengths and weaknesses against three types of causal question. We …

Defending against data-free model extraction by distributionally robust defensive training

Z Wang, L Shen, T Liu, T Duan, Y Zhu… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Data-Free Model Extraction (DFME) aims to clone a black-box model without
knowing its original training data distribution, making it much easier for attackers to steal …

Privacy inference attack and defense in centralized and federated learning: A comprehensive survey

B Rao, J Zhang, D Wu, C Zhu, X Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The emergence of new machine learning methods has led to their widespread application
across various domains, significantly advancing the field of artificial intelligence. However …

Stolenencoder: stealing pre-trained encoders in self-supervised learning

Y Liu, J Jia, H Liu, NZ Gong - Proceedings of the 2022 ACM SIGSAC …, 2022 - dl.acm.org
Pre-trained encoders are general-purpose feature extractors that can be used for many
downstream tasks. Recent progress in self-supervised learning can pre-train highly effective …

SoK: Machine learning governance

V Chandrasekaran, H Jia, A Thudi, A Travers… - arXiv preprint arXiv …, 2021 - arxiv.org
The application of machine learning (ML) in computer systems introduces not only many
benefits but also risks to society. In this paper, we develop the concept of ML governance to …

Practical and efficient model extraction of sentiment analysis APIs

W Wu, J Zhang, VJ Wei, X Chen… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Despite their stunning performance, developing deep learning models from scratch is a
formidable task. Therefore, it popularizes Machine-Learning-as-a-Service (MLaaS), where …

Sentence embedding encoders are easy to steal but hard to defend

A Dziedzic, F Boenisch, M Jiang, H Duan, N Papernot - 2023 - publications.cispa.de
Self-supervised learning (SSL) has become the predominant approach to training on large
amounts of data when no labels are available. Since the corresponding model architectures …