As we’ve covered in several earlier articles (AI & Ownership Part 3: All Out War, AI & Ownership: IP, Competition Law, and Publicity Rights in a Changing World – Part 2, and Raw Story Media & AlterNet Media Inc. V. OpenAI), companies developing Artificial Intelligence (AI) tools, and OpenAI in particular, are currently battling a wave of copyright lawsuits that allege copyright infringement under 17 U. S. Code Section 501, and/or violations of the Digital Millenium Copyright Act 17 U. S. Code Section 1202, arising from use of copyrighted work, without compensation, to train certain large language models (LLMs).
Among other defenses, the AI companies generally claim that this activity is protected by the doctrine of fair use.
No substantive decision has yet been made by courts on the central question of whether training LLMs on copyrighted work (and the output generated by such LLMs) constitutes infringement, although preliminary decisions on motions to dismiss indicate judges believe the plaintiffs’ claims merit trial and discovery.
OPENAI SEEKS CONSOLIDATION OF 8 COPYRIGHT LAWSUITS
On 6 December, 2024, OpenAI sought consolidation of pre-trial proceedings for eight ongoing copyright lawsuits against them, in a petition under 28 U.S. Code Section 1407 (Multidistrict Litigation) with the United States Judicial Panel on Multidistrict Litigation. The plaintiffs in these lawsuits include authors, YouTube creators, news organisations and newspaper companies (collectively, the “Plaintiffs”). These eight cases are spread across five judges in two districts i.e. the Northern District of California (NDC) and Southern District of New York (SDNY).
OpenAI has cited the following reasons for consolidation in the NDC:
- the NDC is where OpenAI’s headquarters, witnesses, training data inspections, source codes and individuals responsible for training large language models (LLMs) are located;
- to prevent duplicative discovery, including multiple redundant witness depositions; and
- to avoid inconsistent pretrial rulings on substantive and significant matters of law (OpenAI pointed to two conflicting opinions from two district judges within the SDNY on separate but near-identical cases).
KEY LEGAL ISSUES IN THE OPENAI COPYRIGHT LAWSUITS
What are the substantive matters of law in the lawsuits that OpenAI is looking to consolidate?
Essentially, the Plaintiffs in these lawsuits all allege that their copyrighted works were used for training of OpenAI’s LLMs, which also led to production of outputs that summarize or quote those copyrighted works. Some also alleged that the training sets used by OpenAI removed crucial copyright details such as the author’s name, title, and other copyright information, (collectively, Copyright Management Information or CMI). Some asserted claims for direct, vicarious, and contributory copyright infringement arising from OpenAI’s alleged use of their works to train its LLMs.
For its part, OpenAI has claimed as a defence the exception of doctrine of fair use codified in 17 U.S.C. § 107, arguing that such training of LLMs constitutes transformative fair use i.e. a use that introduces something new, serving a different purpose or different character, and does not substitute the original use of the work.
RELATED LAWSUITS AND DEVELOPMENTS
In a previous article, we briefly examined the case of Kadrey v. Meta Platforms, wherein the court dismissed the plaintiffs’ claim that the defendant’s LLMs had been trained on the plaintiffs’ books and constituted infringing derivative works. Additionally, the court also rejected the argument that every output generated by the LLM was an infringing derivative work, noting the lack of evidence from the plaintiffs showing that specific outputs or portions of outputs were substantially similar to particular inputs. The court granted the Plaintiffs permission to amend their claims.
In Sarah Anderson et al v. Stability AI et al, the court did not accept the defendants’ motion to dismiss the plaintiffs’ copyright infringement claims, although many of the plaintiffs’ other claims were indeed so dismissed. This indicates that the court, at the very least, believed there was enough substance in the plaintiffs’ claims (that the defendants, being various AI companies, used copies of the plaintiffs’ copyrighted works to train their AI models) to merit moving the case to discovery and trial.
In two closely related lawsuits, Raw Story Media v. OpenAI Inc. and The Intercept Media Inc v. OpenAI Inc., also discussed previously, the Plaintiffs alleged that the training sets used by OpenAI removed CMI from copyrighted material and that this removal violated Section 1202(b)(i) of the DMCA, and sought injunctive relief and damages.
In the Raw Story lawsuit, Judge McMahon of the SDNY ruled that Raw Story had failed to demonstrate any concrete harm arising from the alleged removal of CMI from their works during OpenAI’s large language model (LLM) training process, and therefore, did not have standing to file a DMCA claim. Nevertheless, the Judge did allow Raw Story Media to amend its complaint, which it did.
In contrast, in the Intercept Media lawsuit, Judge Rakoff of the SDNY allowed Intercept Media’s DMCA claim.
Here, two SDNY judges reached two different conclusions on substantially the same facts and legal issues. These, incidentally, are the two SDNY lawsuits that OpenAI pointed to while seeking consolidation.
OPENAI’s DEFENSE: FAIR USE
In recent years, the fair use doctrine has proven to be an effective defense for technology companies against copyright infringement allegations. For instance, in the case of Perfect 10, Inc. v. Amazon.com, Inc., the Ninth Circuit ruled that using copyrighted images in search engine results constituted fair use because the use of thumbnail images was transformative. The court noted that while the original images served ‘entertainment, aesthetic, or informative functions,’ the search engine repurposed them as ‘pointers directing users to sources of information.’
Similarly, in Authors Guild, Inc. v. Google, Inc., the Second Circuit determined that Google’s digitization of copyrighted texts for its Google Books search engine was also fair use due to its highly transformative nature. Although Google scanned entire copyrighted texts, it displayed only snippets, which functioned as ‘pointers directing users to a wide range of books.’
The above two cases indicate that courts are likely to recognize fair use exceptions to copyright infringement when the end product has a functional purpose and provides significant social utility. In future articles, we will examine how OpenAI and other AI companies have used the fair use exception as a defence in the lawsuits they’re involved in.
Authors: Shantanu Mukherjee, Priyansha Agarwal