The wmdp benchmark: Measuring and reducing malicious use with unlearning

N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti… - arXiv preprint arXiv …, 2024 - arxiv.org
The White House Executive Order on Artificial Intelligence highlights the risks of large
language models (LLMs) empowering malicious actors in developing biological, cyber, and …

Guardrail baselines for unlearning in llms

P Thaker, Y Maurya, S Hu, ZS Wu, V Smith - arXiv preprint arXiv …, 2024 - arxiv.org
Recent work has demonstrated that finetuning is a promising approach to'unlearn'concepts
from large language models. However, finetuning can be expensive, as it requires both …

A comprehensive survey of LLM alignment techniques: RLHF, RLAIF, PPO, DPO and more

Z Wang, B Bi, SK Pentyala, K Ramnath… - arXiv preprint arXiv …, 2024 - arxiv.org
With advancements in self-supervised learning, the availability of trillions tokens in a pre-
training corpus, instruction fine-tuning, and the development of large Transformers with …

[HTML][HTML] Digital forgetting in large language models: A survey of unlearning methods

A Blanco-Justicia, N Jebreel… - Artificial Intelligence …, 2025 - Springer
Large language models (LLMs) have become the state of the art in natural language
processing. The massive adoption of generative LLMs and the capabilities they have shown …

Demystifying verbatim memorization in large language models

J Huang, D Yang, C Potts - arXiv preprint arXiv:2407.17817, 2024 - arxiv.org
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with
serious legal and privacy implications. Much prior work has studied such verbatim …

Machine unlearning in generative ai: A survey

Z Liu, G Dou, Z Tan, Y Tian, M Jiang - arXiv preprint arXiv:2407.20516, 2024 - arxiv.org
Generative AI technologies have been deployed in many places, such as (multimodal) large
language models and vision generative models. Their remarkable performance should be …

Evaluating copyright takedown methods for language models

B Wei, W Shi, Y Huang, NA Smith, C Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Language models (LMs) derive their capabilities from extensive training on diverse data,
including potentially copyrighted material. These models can memorize and generate …

Negative preference optimization: From catastrophic collapse to effective unlearning

R Zhang, L Lin, Y Bai, S Mei - arXiv preprint arXiv:2404.05868, 2024 - arxiv.org
Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …

Towards safer large language models through machine unlearning

Z Liu, G Dou, Z Tan, Y Tian, M Jiang - arXiv preprint arXiv:2402.10058, 2024 - arxiv.org
The rapid advancement of Large Language Models (LLMs) has demonstrated their vast
potential across various domains, attributed to their extensive pretraining knowledge and …

Towards efficient and effective unlearning of large language models for recommendation

H Wang, J Lin, B Chen, Y Yang, R Tang… - Frontiers of Computer …, 2025 - Springer
Conclusion In this letter, we propose E2URec, the efficient and effective unlearning method
for LLMRec. Our method enables LLMRec to efficiently forget the specific data by only …