Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, RGH Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

Unified concept editing in diffusion models

R Gandikota, H Orgad, Y Belinkov… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-image models suffer from various safety issues that may limit their suitability for
deployment. Previous methods have separately addressed individual issues of bias …

Membership inference attacks against language models via neighbourhood comparison

J Mattern, F Mireshghallah, Z Jin, B Schölkopf… - arXiv preprint arXiv …, 2023 - arxiv.org
Membership Inference attacks (MIAs) aim to predict whether a data sample was present in
the training data of a machine learning model or not, and are widely used for assessing the …

Machine unlearning: Solutions and challenges

J Xu, Z Wu, C Wang, X Jia - IEEE Transactions on Emerging …, 2024 - ieeexplore.ieee.org
Machine learning models may inadvertently memorize sensitive, unauthorized, or malicious
data, posing risks of privacy breaches, security vulnerabilities, and performance …

Deep regression unlearning

AK Tarun, VS Chundawat, M Mandal… - International …, 2023 - proceedings.mlr.press
With the introduction of data protection and privacy regulations, it has become crucial to
remove the lineage of data on demand from a machine learning (ML) model. In the last few …

Potential merits and flaws of large language models in epilepsy care: a critical review

E van Diessen, RA van Amerongen, M Zijlmans… - …, 2024 - Wiley Online Library
The current pace of development and applications of large language models (LLMs) is
unprecedented and will impact future medical care significantly. In this critical review, we …

Students parrot their teachers: Membership inference on model distillation

M Jagielski, M Nasr, K Lee… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Model distillation is frequently proposed as a technique to reduce the privacy
leakage of machine learning. These empirical privacy defenses rely on the intuition that …

Mace: Mass concept erasure in diffusion models

S Lu, Z Wang, L Li, Y Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The rapid expansion of large-scale text-to-image diffusion models has raised growing
concerns regarding their potential misuse in creating harmful or misleading content. In this …

[HTML][HTML] A survey on membership inference attacks and defenses in Machine Learning

J Niu, P Liu, X Zhu, K Shen, Y Wang, H Chi… - Journal of Information …, 2024 - Elsevier
Membership inference (MI) attacks mainly aim to infer whether a data record was used to
train a target model or not. Due to the serious privacy risks, MI attacks have been attracting a …

Achilles' heels: vulnerable record identification in synthetic data publishing

M Meeus, F Guepin, AM Creţu… - European Symposium on …, 2023 - Springer
Synthetic data is seen as the most promising solution to share individual-level data while
preserving privacy. Shadow modeling-based Membership Inference Attacks (MIAs) have …