Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective …
Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance …
Large language models (LLMs) excel in various capabilities but also pose safety risks such as generating harmful content and misinformation, even after safety alignment. In this paper …