- 学术资源搜索

The wmdp benchmark: Measuring and reducing malicious use with unlearning

N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti… - arXiv preprint arXiv …, 2024 - arxiv.org

The White House Executive Order on Artificial Intelligence highlights the risks of large
language models (LLMs) empowering malicious actors in developing biological, cyber, and …

被引用次数：81 相关文章所有 3 个版本

[PDF] arxiv.org

Guardrail baselines for unlearning in llms

P Thaker, Y Maurya, S Hu, ZS Wu, V Smith - arXiv preprint arXiv …, 2024 - arxiv.org

Recent work has demonstrated that finetuning is a promising approach to'unlearn'concepts
from large language models. However, finetuning can be expensive, as it requires both …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

A comprehensive survey of LLM alignment techniques: RLHF, RLAIF, PPO, DPO and more

Z Wang, B Bi, SK Pentyala, K Ramnath… - arXiv preprint arXiv …, 2024 - arxiv.org

With advancements in self-supervised learning, the availability of trillions tokens in a pre-
training corpus, instruction fine-tuning, and the development of large Transformers with …

被引用次数：7 相关文章所有 2 个版本

[HTML] springer.com

[HTML][HTML] Digital forgetting in large language models: A survey of unlearning methods

A Blanco-Justicia, N Jebreel… - Artificial Intelligence …, 2025 - Springer

Large language models (LLMs) have become the state of the art in natural language
processing. The massive adoption of generative LLMs and the capabilities they have shown …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Demystifying verbatim memorization in large language models

J Huang, D Yang, C Potts - arXiv preprint arXiv:2407.17817, 2024 - arxiv.org

Large Language Models (LLMs) frequently memorize long sequences verbatim, often with
serious legal and privacy implications. Much prior work has studied such verbatim …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Machine unlearning in generative ai: A survey

Z Liu, G Dou, Z Tan, Y Tian, M Jiang - arXiv preprint arXiv:2407.20516, 2024 - arxiv.org

Generative AI technologies have been deployed in many places, such as (multimodal) large
language models and vision generative models. Their remarkable performance should be …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Evaluating copyright takedown methods for language models

B Wei, W Shi, Y Huang, NA Smith, C Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Language models (LMs) derive their capabilities from extensive training on diverse data,
including potentially copyrighted material. These models can memorize and generate …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Negative preference optimization: From catastrophic collapse to effective unlearning

R Zhang, L Lin, Y Bai, S Mei - arXiv preprint arXiv:2404.05868, 2024 - arxiv.org

Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …

被引用次数：65 相关文章所有 2 个版本

[PDF] arxiv.org

Towards safer large language models through machine unlearning

Z Liu, G Dou, Z Tan, Y Tian, M Jiang - arXiv preprint arXiv:2402.10058, 2024 - arxiv.org

The rapid advancement of Large Language Models (LLMs) has demonstrated their vast
potential across various domains, attributed to their extensive pretraining knowledge and …

被引用次数：55 相关文章所有 2 个版本

[PDF] arxiv.org

Towards efficient and effective unlearning of large language models for recommendation

H Wang, J Lin, B Chen, Y Yang, R Tang… - Frontiers of Computer …, 2025 - Springer

Conclusion In this letter, we propose E2URec, the efficient and effective unlearning method
for LLMRec. Our method enables LLMRec to efficiently forget the specific data by only …

被引用次数：12 相关文章所有 2 个版本

高级搜索

QQ 群

The wmdp benchmark: Measuring and reducing malicious use with unlearning

Guardrail baselines for unlearning in llms

A comprehensive survey of LLM alignment techniques: RLHF, RLAIF, PPO, DPO and more

[HTML][HTML] Digital forgetting in large language models: A survey of unlearning methods

Demystifying verbatim memorization in large language models

Machine unlearning in generative ai: A survey

Evaluating copyright takedown methods for language models

Negative preference optimization: From catastrophic collapse to effective unlearning

Towards safer large language models through machine unlearning

Towards efficient and effective unlearning of large language models for recommendation

引用