Two authors, Paul Tremblay and Mona Awad, have filed a class action lawsuit in the US District Court against OpenAI. The lawsuit covers a broad range of claims, namely:
- Direct copyright infringement because (i) they did not grant OpenAI permission to make copies of their books for use in its training dataset for OpenAI’s large language models (essentially GPT-1, GPT-2, GPT-3, GPT-3.5 and GPT-4) and (ii) on the basis that the OpenAI LLMs themselves are infringing derivative works as “they cannot function without the expressive information extracted from the plaintiff’s works”.
- Vicarious copyright infringement for each and every output rendered by the LLMs on the basis that those outputs are based upon expressive information extracted from the Plaintiff’s works.
- Removal of Copyright Management Information on the basis that, by design, the training process for OpenAI’s LLMs does not preserve it.
- Unfair competition, under the California Business and Professional Code.
- Negligence on the basis that OpenAI owes a duty of care to the Plaintiffs which they had breached by collecting, maintaining and controlling the Plaintiffs’ and class members’ works and then training systems (including ChatGPT) on them without permission.
- Unjust enrichment by utilising access to the Plaintiff’s works to train ChatGPT, thereby depriving the Plaintiff’s and class members of the benefits of their works.
This ‘all-in’ approach to the claims is perhaps unsurprising, given that these are some of the first cases of their kind to be tested in the courts. In reality the position is far from straightforward, particularly in relation to the copyright claims. For example, in the US there is a not inconsiderable possibility that the use of these works in training datasets could be permitted under the fair use doctrine and therefore not infringe copyright. While copyright owners remain firmly of the view that their authorisation is required – and they should be paid – for use of their works in training data, some AI developers take the view that the nature of the use of the copyright works does not warrant authorisation or payment; the works are merely being used as ‘data’ and not consumed or enjoyed by a human as a work.
Assuming it proceeds to trial, the outcome of this case will shed some light on these issues but there are now quite a few similar actions on foot to keep an eye on, including:
- A US class action against GitHub, Microsoft and OpenAI, concerning the use of code from GitHub to train Copilot (see here)
- A US class action against Stability AI, Midjourney and DeviantArt, concerning the use of artistic works in training data (see here)
- An US action by Getty Images against Stability AI, concerning the use of artistic works in training data (see here)
- A US class action against Microsoft and OpenAI, broadly centred around privacy (see here)
- A UK action by Getty Images against Stability AI concerning the use of artistic works in training data.