Negating negatives: Alignment without human positive samples via distributional dispreference...

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Negating negatives: Alignment without human positive samples via distributional dispreference...

在引用文章中搜索

[PDF] arxiv.org

Negative preference optimization: From catastrophic collapse to effective unlearning

R Zhang, L Lin, Y Bai, S Mei - arXiv preprint arXiv:2404.05868, 2024 - arxiv.org

Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …

被引用次数：30 相关文章所有 2 个版本

[PDF] arxiv.org

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

K D'Oosterlinck, W Xu, C Develder… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) are often aligned using contrastive alignment objectives
and preference pair datasets. The interaction between model, paired data, and objective …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Rethinking Entity-level Unlearning for Large Language Models

W Ma, X Feng, W Zhong, L Huang, Y Ye… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language model unlearning has gained increasing attention due to its potential to
mitigate security and privacy concerns. Current research predominantly focuses on Instance …

Finding Safety Neurons in Large Language Models

J Chen, X Wang, Z Yao, Y Bai, L Hou, J Li - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) excel in various capabilities but also pose safety risks such
as generating harmful content and misinformation, even after safety alignment. In this paper …

高级搜索

QQ 群

Negating negatives: Alignment without human positive samples via distributional dispreference...

Negative preference optimization: From catastrophic collapse to effective unlearning

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Rethinking Entity-level Unlearning for Large Language Models

Finding Safety Neurons in Large Language Models

引用