OpenAI Ordered to Hand Over 20M ChatGPT Logs in NYT Copyright Case

In short

The ruling compels OpenAI to supply 20 million chat logs after months of disputes over privateness, preservation, and scope.
Choose Ona T. Wang dominated that the pattern measurement is “proportional” to what the case must show whether or not ChatGPT outputs reproduced Occasions content material.
The case joins a rising wave of copyright challenges geared toward how AI labs supply and use coaching knowledge.

A federal Justice of the Peace decide has ordered OpenAI to show over roughly 20 million de-identified ChatGPT logs to The New York Occasions and different plaintiffs, deepening the AI growth firm’s publicity to an array of copyright and knowledge governance disputes.

Issued on Wednesday in New York, the order denies OpenAI’s bid to dam the manufacturing of user-chat information and directs the corporate handy over the logs below a protecting framework.

The end result might form how tech companies similar to OpenAI, Anthropic, and Perplexity supply coaching knowledge, license content material, and construct guardrails round and over what their techniques can output.

Whereas the courtroom “acknowledges that the privateness issues of OpenAI’s customers are honest,” such issues “are just one issue within the proportionality evaluation, and can’t predominate the place there may be clear relevance and minimal burden,” U.S. Justice of the Peace Choose Ona T. Wang wrote.

Decrypt has reached out to each events for remark.

The order stems from the Occasions’ ongoing lawsuit, which alleges that OpenAI’s fashions have been educated on copyrighted information content material with out permission. It was first introduced ahead in December 2023.

In January final yr, OpenAI challenged the NYT’s claims and filed a countersuit, claiming that the publication was not “telling the complete story.”

The courtroom later discovered that the 20 million chat log samples in query are “proportional to the wants of the case” to evaluate whether or not ChatGPT outputs copied the NYT’s materials.

Over the previous yr, the dispute has intensified, with plaintiffs urgent for broad entry to output knowledge, and OpenAI warning that expansive manufacturing of those supplies would increase privateness and operational burdens.

In June, OpenAI confronted one other setback when the courtroom ordered the corporate to maintain a variety of ChatGPT person knowledge for the lawsuit, together with chats customers could have already deleted.

Months later, in October, the dispute resurfaced, with the courtroom flagging OpenAI’s October 20 submitting (ECF 679) that challenged the manufacturing of the 20 million log pattern, and ordered either side to submit clarifications on why they disagree.

On the time, the decide pressed the events to elucidate how the combat associated to earlier issues over deleted logs and whether or not OpenAI had backed away from prior agreements on what it beforehand claimed it will flip over.

Late final month, OpenAI filed a proper objection asking the district decide to overturn the Justice of the Peace decide’s discovery order.

The corporate argued that the ruling was “clearly faulty” and “disproportionate,” in that it will drive the corporate to reveal thousands and thousands of personal person conversations, in line with a courtroom doc shared with Decrypt by an OpenAI consultant.

The dispute arises as a part of a broader offensive in opposition to AI labs, with authors, information organizations, music publishers, and code repositories looking for to check how far present copyright regulation extends when fashions ingest and reproduce protected materials.

Courts throughout the U.S. and Europe at the moment are sorting via comparable claims.