In what will be a landmark test of how artificial intelligence (AI) can be accommodated within legal frameworks designed for a pre-AI world, the trial of Getty Images v Stability AI begins today.
Background
Getty Images (Getty) licenses its content (including photos, videos and illustrations) globally, a substantial proportion of which is protected by copyright. Stability AI (Stability) is a London-based open-source generative AI company which offers “[a] model for generating professional-grade images”. Getty claims that Stability has scraped millions of images from Getty’s website, without its consent, and used those images unlawfully as input data to train and develop Stability’s AI model (Stable Diffusion). Stability has admitted to using at least some images from Getty’s website. Getty further claims that Stable Diffusion reproduces in substantial part its copyright works.
Getty’s case
Getty claims database right infringement, trade mark infringement and passing off, but attention has been focused on its copyright claims given the potential implications for current practices in the training of AI models. In summary, Getty claims:
(1) that during the development and training of Stable Diffusion, Stability downloaded Getty’s copyright works onto computers and/or servers in the UK and this amounts to copyright infringement;
(2) secondary copyright infringement by Stability importing the pre-trained Stability Diffusion software into the UK; and
(3) that Stable Diffusion’s AI-generated images (accessed by UK users) infringe Getty’s copyright by reproducing a substantial part of its copyright works.
Stability AI’s defence
In defence of the claim, Stability argues amongst other things that:
(1) Stable Diffusion’s training took place outside the UK and its development did not involve the reproduction of Getty’s copyright works in the UK;
(2) there is no secondary copyright infringement because Stable Diffusion is not an infringing copy, is not an article and has not been imported into the UK by Stability;
(3) Stable Diffusion’s AI-generated images do not derive any output from Getty’s copyright works. The examples cited by Getty arose because it used prompts which substantially corresponded to the captions on its images, which is not representative of use by normal users. To the extent that any output of Stable Diffusion includes Getty’s copyright works in combination with other elements, then Stability relies on the defence of pastiche (i.e. an artistic style that imitates another work or artist); and
(4) it is the user, not Stability, who generates output from Stable Diffusion. Stability has not authorised, and indeed has taken steps to prevent, any infringing acts by its users.
Analysis
If the court decides that Stability AI has infringed Getty’s copyright, it will be transformative for the existing norms governing the training of AI models in the UK. The current process depends on the consumption of vast volumes of data (including words, images and videos) from the internet in a process referred to as “scraping”. That data is generally scraped indiscriminately, regardless of its copyright status. Naturally, however, a large proportion of this data is protected by copyright.
A finding in Getty’s favour would require developers to either cease their use of copyright works or to obtain licences for their use, significantly fettering their current approach to model training. However, the practical effect of such a judgment is unlikely to entirely prevent the misuse of copyright works in the training of AI models, but rather encourage the export of this process to other jurisdictions. In terms of existing AI models trained on copyright works scraped from the internet, a requirement to unpick those would come as very unwelcome news to AI developers.
By contrast, a finding in Stability’s favour would cement AI developers’ ability to keep doing what they are doing. It would also broadly align with the government’s current proposals to allow AI companies to rely on an exception for “text and data mining” so that they can use copyrighted work to train their models unless the copyright holders opt out. The proposal is currently facing opposition from the House of Lords and the creative industries, who argue that it will allow AI models to free ride on creators’ works.
The case also introduces to the courtroom the practical reality of interrogating an AI model and the challenges this presents for the traditional judicial process. In an earlier procedural judgment, the court emphasised that the extent to which Getty’s copyright works were used to train Stable Diffusion is a matter within the knowledge of Stability. Determining which copyright works were used in the training set, however, “would be wholly disproportionate and practically impossible without significant resource.” Rather than scrutinise the entire model, the court has ordered that Stability provide Getty with a sample of 10,000 randomly selected prompts for Getty to compare to its image captions to assess where there are verbatim matches. The trial will reveal how successful this approach has been and whether this sets the bar for how such models are scrutinised in future.
You can find more of our commentary on this case here.