CopyBench: Measuring literal and non-literal reproduction of copyright-protected text in language model generation
Evaluating the degree of reproduction of copyright-protected content by language models
(LMs) is of significant interest to the AI and legal communities. Although both literal and non …
(LMs) is of significant interest to the AI and legal communities. Although both literal and non …
Quantifying generalization complexity for large language models
While large language models (LLMs) have shown exceptional capabilities in understanding
complex queries and performing sophisticated tasks, their generalization abilities are often …
complex queries and performing sophisticated tasks, their generalization abilities are often …
The Pitfalls of Memorization: When Memorization Hurts Generalization
R Bayat, M Pezeshki, E Dohmatob… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural networks often learn simple explanations that fit the majority of the data while
memorizing exceptions that deviate from these explanations. This behavior leads to poor …
memorizing exceptions that deviate from these explanations. This behavior leads to poor …
Recite, reconstruct, recollect: Memorization in LMs as a multifaceted phenomenon
Memorization in language models is typically treated as a homogenous phenomenon,
neglecting the specifics of the memorized data. We instead model memorization as the effect …
neglecting the specifics of the memorized data. We instead model memorization as the effect …
Reason-and-Execute Prompting: Enhancing Multi-Modal Large Language Models for Solving Geometry Questions
Multi-Modal Large Language Models (MM-LLMs) have demonstrated powerful reasoning
abilities in various visual question-answering tasks. However, they face the challenge of …
abilities in various visual question-answering tasks. However, they face the challenge of …
Latent adversarial training improves robustness to persistent harmful behaviors in llms
Large language models (LLMs) can often be made to behave in undesirable ways that they
are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a …
are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a …
Towards more realistic extraction attacks: An adversarial perspective
Language models are prone to memorizing large parts of their training data, making them
vulnerable to extraction attacks. Existing research on these attacks remains limited in scope …
vulnerable to extraction attacks. Existing research on these attacks remains limited in scope …
Human-inspired Perspectives: A Survey on AI Long-term Memory
With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize
information over the long term-referred to as long-term memory-have become increasingly …
information over the long term-referred to as long-term memory-have become increasingly …
A probabilistic perspective on unlearning and alignment for large language models
Comprehensive evaluation of Large Language Models (LLMs) is an open research problem.
Existing evaluations rely on deterministic point estimates generated via greedy decoding …
Existing evaluations rely on deterministic point estimates generated via greedy decoding …
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
In this work, we address the problem of large language model (LLM) unlearning, aiming to
remove unwanted data influences and associated model capabilities (eg, copyrighted data …
remove unwanted data influences and associated model capabilities (eg, copyrighted data …