CopyBench: Measuring literal and non-literal reproduction of copyright-protected text in language model generation

T Chen, A Asai, N Mireshghallah, S Min… - arXiv preprint arXiv …, 2024 - arxiv.org
Evaluating the degree of reproduction of copyright-protected content by language models
(LMs) is of significant interest to the AI and legal communities. Although both literal and non …

Quantifying generalization complexity for large language models

Z Qi, H Luo, X Huang, Z Zhao, Y Jiang, X Fan… - arXiv preprint arXiv …, 2024 - arxiv.org
While large language models (LLMs) have shown exceptional capabilities in understanding
complex queries and performing sophisticated tasks, their generalization abilities are often …

The Pitfalls of Memorization: When Memorization Hurts Generalization

R Bayat, M Pezeshki, E Dohmatob… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural networks often learn simple explanations that fit the majority of the data while
memorizing exceptions that deviate from these explanations. This behavior leads to poor …

Recite, reconstruct, recollect: Memorization in LMs as a multifaceted phenomenon

US Prashanth, A Deng, K O'Brien, J SV… - arXiv preprint arXiv …, 2024 - arxiv.org
Memorization in language models is typically treated as a homogenous phenomenon,
neglecting the specifics of the memorized data. We instead model memorization as the effect …

Reason-and-Execute Prompting: Enhancing Multi-Modal Large Language Models for Solving Geometry Questions

X Duan, D Tan, L Fang, Y Zhou, C He, Z Chen… - Proceedings of the …, 2024 - dl.acm.org
Multi-Modal Large Language Models (MM-LLMs) have demonstrated powerful reasoning
abilities in various visual question-answering tasks. However, they face the challenge of …

Latent adversarial training improves robustness to persistent harmful behaviors in llms

A Sheshadri, A Ewart, P Guo, A Lynch, C Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) can often be made to behave in undesirable ways that they
are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a …

Towards more realistic extraction attacks: An adversarial perspective

Y More, P Ganesh, G Farnadi - arXiv preprint arXiv:2407.02596, 2024 - arxiv.org
Language models are prone to memorizing large parts of their training data, making them
vulnerable to extraction attacks. Existing research on these attacks remains limited in scope …

Human-inspired Perspectives: A Survey on AI Long-term Memory

Z He, W Lin, H Zheng, F Zhang, M Jones… - arXiv preprint arXiv …, 2024 - arxiv.org
With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize
information over the long term-referred to as long-term memory-have become increasingly …

A probabilistic perspective on unlearning and alignment for large language models

Y Scholten, S Günnemann, L Schwinn - arXiv preprint arXiv:2410.03523, 2024 - arxiv.org
Comprehensive evaluation of Large Language Models (LLMs) is an open research problem.
Existing evaluations rely on deterministic point estimates generated via greedy decoding …

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

C Fan, J Liu, L Lin, J Jia, R Zhang, S Mei… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we address the problem of large language model (LLM) unlearning, aiming to
remove unwanted data influences and associated model capabilities (eg, copyrighted data …