An archival perspective on pretraining data

MA Desai, IV Pasquetto, AZ Jacobs, D Card - Patterns, 2024 - cell.com
Alongside an explosion in research and development related to large language models,
there has been a concomitant rise in the creation of pretraining datasets—massive …

CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images

A Gokaslan, AF Cooper, J Collins… - Proceedings of the …, 2024 - openaccess.thecvf.com
We train a set of open text-to-image (T2I) diffusion models on a dataset of curated Creative-
Commons-licensed (CC) images which yields models that are competitive with Stable …

The Files are in the Computer: Copyright, Memorization, and Generative AI

AF Cooper, J Grimmelmann - arXiv preprint arXiv:2404.12590, 2024 - arxiv.org
A central issue in copyright lawsuits against generative-AI companies is the degree to which
a generative-AI model does or does not" memorize" the data it was trained on. Unfortunately …

Evaluating Copyright Takedown Methods for Language Models

B Wei, W Shi, Y Huang, NA Smith, C Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Language models (LMs) derive their capabilities from extensive training on diverse data,
including potentially copyrighted material. These models can memorize and generate …

[PDF][PDF] Between Randomness and Arbitrariness: Some Lessons for Reliable Machine Learning at Scale (The Short Version)

AF Cooper - Available at SSRN 4860005, 2024 - afedercooper.info
This document contains the introductory chapter of the dissertation,“Between Randomness
and Arbitrariness: Some Lessons for Reliable Machine Learning at Scale,” which was …

International Scientific Report on the Safety of Advanced AI

B Yohsua, P Daniel, B Tamay, B Rishi, C Stephen… - 2024 - hal.science
We are in the midst of a technological revolution that will fundamentally alter the way we live,
work, and relate to one another. Artificial Intelligence (AI) promises to transform many …

[PDF][PDF] The Files are in the Computer: Copyright, Memorization, and Generative AI

AG AI - afedercooper.info
The Files are in the Computer: Copyright, Memorization, and Generative AI Page 1 The Files
are in the Computer: Copyright, Memorization, and Generative AI A. Feder Cooper* James …