Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arXiv preprint arXiv …, 2024 - arxiv.org
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

Challenging forgets: Unveiling the worst-case forget sets in machine unlearning

C Fan, J Liu, A Hero, S Liu - European Conference on Computer Vision, 2025 - Springer
The trustworthy machine learning (ML) community is increasingly recognizing the crucial
need for models capable of selectively 'unlearning'data points after training. This leads to the …

Learning to unlearn for robust machine unlearning

MH Huang, LG Foo, J Liu - European Conference on Computer Vision, 2025 - Springer
Abstract Machine unlearning (MU) seeks to remove knowledge of specific data samples
from trained models without the necessity for complete retraining, a task made challenging …

Biasalert: A plug-and-play tool for social bias detection in llms

Z Fan, R Chen, R Xu, Z Liu - arXiv preprint arXiv:2407.10241, 2024 - arxiv.org
Evaluating the bias in Large Language Models (LLMs) becomes increasingly crucial with
their rapid development. However, existing evaluation methods rely on fixed-form outputs …

Editable fairness: Fine-grained bias mitigation in language models

R Chen, Y Li, J Yang, JT Zhou, Z Liu - arXiv preprint arXiv:2408.11843, 2024 - arxiv.org
Generating fair and accurate predictions plays a pivotal role in deploying large language
models (LLMs) in the real world. However, existing debiasing methods inevitably generate …

Multidelete for multimodal machine unlearning

J Cheng, H Amiri - European Conference on Computer Vision, 2025 - Springer
Abstract Machine Unlearning removes specific knowledge about training data samples from
an already trained model. It has significant practical benefits, such as purging private …

CURE4Rec: A benchmark for recommendation unlearning with deeper influence

C Chen, J Zhang, Y Zhang, L Zhang, L Lyu, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
With increasing privacy concerns in artificial intelligence, regulations have mandated the
right to be forgotten, granting individuals the right to withdraw their data from models …

Modality-fair preference optimization for trustworthy mllm alignment

S Jiang, Y Zhang, R Chen, Y Jin, Z Liu - arXiv preprint arXiv:2410.15334, 2024 - arxiv.org
Direct Preference Optimization (DPO) is effective for aligning large language models (LLMs),
but when applied to multimodal models (MLLMs), it often favors text over image information …

Measuring and Mitigating Stereotype Bias in Language Models: An Overview of Debiasing Techniques

Z Sokolová, M Harahus, J Staš… - 2024 International …, 2024 - ieeexplore.ieee.org
This paper provides an overview of methods for measuring the stereotype bias of pre-trained
language models. It explains the term 'stereotype bias' and its measurement. A …

Mu-bench: A multitask multimodal benchmark for machine unlearning

J Cheng, H Amiri - arXiv preprint arXiv:2406.14796, 2024 - arxiv.org
Recent advancements in Machine Unlearning (MU) have introduced solutions to selectively
remove certain training samples, such as those with outdated or sensitive information, from …