A recent case in the District Court of Hamburg addressed the intersection of copyright law, text and data mining (TDM), and artificial intelligence (AI) training. The claim, brought by photographer Mr. Kneschke against the AI research non-profit organisation LAION e.V. (LAION), concerned LAION's use of one of Mr Kneschke's photographs in one of LAION’s AI training datasets (LAION-5B) and whether that use infringed Mr Kneschke’s copyright in his photograph.
The LAION-5B dataset is made up of 5.85 billion image-text pairs, each of which comprise of a hyperlink to a publicly accessible image or image file on the Internet, together with text about the image including a description of the image. LAION makes the LAION-5B dataset publicly available for others to use, including when training generative AI.
While there were a number of issues before the court, it ultimately sided with LAION, finding that it could rely on the TDM exception in Article 3 of the DSM Directive[1], which permits TDM for the purposes of scientific research.
Background
Mr Kneschke’s image had been displayed with a watermark on the stock image website bigstock.com where it was offered for licensing. LAION had downloaded and analysed the freely accessible watermarked image and subsequently included it in the LAION-5B dataset. Mr Kneschke alleged that LAION’s reproduction of his photograph in the LAION-5B dataset infringed his copyright.
The following questions were put before the court:
1. Did the reproductions made by LAION fall under the temporary copy exception in Article 5(1) of the InfoSoc Directive[2]?
Article 5(1) of the InfoSoc Directive is implemented in Germany by §44a UrhG. Article 5(1) states:
“1. Temporary acts of reproduction referred to in Article 2, which are transient or incidental [and] an integral and essential part of a technological process and whose sole purpose is to enable:
- a transmission in a network between third parties by an intermediary, or
- a lawful use of a work or other subject-matter to be made, and which have no independent economic significance, shall be exempted from the reproduction right provided for in Article 2.”
The court determined that the copying by LAION was not "transient or incidental". This was because the copy in question was not deleted automatically but was dependent upon the involvement of LAION. As such, the specific duration of the storage was unclear and could not be considered transient or incidental.
The court also determined that the copying by LAION was not "an integral and essential part of a technical process". This was because the image file in question was downloaded by LAION in a targeted manner in order to analyse it with software. The act of downloading did not merely accompany the technical process: it was an actively controlled process separate to the analysis.
Therefore, the court rejected LAION's argument that the copying of the photograph fell under this exception.
2. Were the reproductions permitted under the TDM exception in Article 4 of the DSM Directive?
The court expressed doubt that this exception would apply, seemingly on account of Mr Kneschke having opted his photograph out of the exception in Article 4, but declined to render a final decision on the point.
Applicability of Article 4
Article 4 of the DSM Directive provides an exception for “reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining”. This is transposed into German law by §44b UrhG.
According to Article 2(2) of the DSM Directive (as reflected in §44b(1) UrhG), “text and data mining means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”. In this case, the court concluded that LAION carried out the acts of reproduction in order to find such correlations, between image files and pre-existing text descriptions of image files.
The court declined to draw a distinction between, on the one hand, the copyright works themselves and, on the other hand the information in the data of which those works are comprised. Clearly, if Article 4 could only be relied upon in respect of reproduction of data/information, rather than the copyright work, it would have a much more limited application. However, the court was not convinced there was a difference between information in the data and the copyright work.
The court considered some arguments around the proper scope of the TDM exception in Article 4 and also noted the relevance of Article 53(1)(c) of the EU AI Act[3], which requires general purpose AI providers to identify and comply with any right-holders' opt-outs under Article 4. Since general purpose AI providers commonly perform AI training processes (and use data containing intellectual property in the process), the court held that Article 4 DSM (and thus §44b UrhG) applies to reproductions for the purpose of AI training, as well.
Ultimately the court elected to address the applicability of the exception in a narrow sense, insofar as it only considered the reproductions of copyright works by LAION for the purpose of creating the LAION-5B dataset. It did not consider the subsequent making available of that dataset by LAION to third parties for further use (including for commercial purposes).
Lawful access
The court determined that the presence of a preview image with a watermark on a stock image library online (such as that depicting Mr Kneschke’s photograph) was "lawfully accessible" in the sense that it was freely accessible on the Internet.
Opt-out
bigstock.com’s Terms of Service contained the following wording which Mr Kneschke argued to be a valid opt-out of the TDM exception in line with Article 4(3) of the DSM:
“YOU MAY NOT […] Use automated programs, applets, bots or the like to access the Bigstock.com website or any content thereon for any purpose, including, by way of example only, downloading Content, indexing, scraping or caching any content on the website.”
The court did not make a final decision on whether this natural language opt-out, as opposed to one made under a specific standardised machine-readable format, such as the Robot Exclusion Protocol, was sufficiently ‘machine-readable’ to act as an effective opt-out from the Article 4 TDM exception. However, it expressly leaned towards agreeing with Mr Kneschke that the natural language opt-out would indeed suffice. It further clarified that the stock image website would have the right to opt-out of TDM on the photographer's behalf as a licensee.
In its evaluation of the question on machine-readability, the court further highlighted the importance of case-by-case assessments based on the technology available at the time of use and referenced Article 53(1)(c) of the EU AI Act, which emphasises using state-of-the-art technology, including AI, to observe the opt-out provision, stating:
“Within the framework of the AI Act, the European legislator has stipulated that GPAI providers must have a strategy in place to identify and comply with an opt-out in line with Article 4(3) of the DSM Directive, including via “state-of-the-art technologies” (see Article 53(1) of the AI Act). These “state-of-the-art technologies” unequivocally include AI systems that are able to grasp the content of text written in natural language…” [310 O 227/23, section 2 b) (4), page 16]
The court also distinguished the machine-readability requirements under the DSM Directive from those under the PSI Directive[4] which does not allow for a natural language opt-out, with the difference in approach being due to the different purposes of the two directives.
3. Were the reproductions permitted under the TDM exception in Article 3 of the DSM Directive ?
LAION was entitled to rely on the TDM exception in Article 3.
Article 3 of the DSM Directive permits “reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access”. This is transposed into German law by §60d UrhG.
The court decided that the reproduction was made for the purposes of scientific research within the meaning of the TDM exception in Article 3. Firstly, the court dismissed Mr Kneshcke’s argument that simply creating a database does not lead to scientific insights and therefore would not qualify as reproduction for scientific purposes. Indeed, it found that “[the] creation of a dataset as the basis for training AI systems can be considered scientific research as it is a fundamental step for future knowledge generation" [310 O 227/23, section 3 (a), page 18].
Secondly, the court rejected Mr Kneshcke’s claim that LAION's research had commercial purposes simply because some companies subsequently using the dataset operate commercially. This is because “the defendant makes the dataset created using text and data mining freely available to the public. In doing so, the defendant itself does not pursue any commercial interests with the dataset" [310 O 227/23, section 3 (b), page 19]. The court considered it irrelevant that the training set is also used by commercially active companies or that some of LAION’s employees are also part of commercially active companies.
Commentary
Kneschke v LAION offers some insights into the legal interpretation of TDM, scientific research, and AI training, particularly concerning the interplay between the TDM exception and the AI Act.
However, it is also open to criticism for at least two reasons. Firstly, the court’s approach to the TDM exception for scientific research in Article 3 – in which it took a very constrained view of LAION’s purpose - is open to debate. Secondly, it is LAION that makes its dataset available to other parties to use commercially and ‘making available’ is another of the exclusive rights of a copyright owner over and above reproduction. The court also declined the opportunity to fully grapple with the general TDM exception in Article 4, although some of the references to it are helpful to a degree (such as the possibility if executing an opt out in machine readable plain text language). There are a number of anticipated decisions which may provide some further clarity on the boundaries between copyright and AI development, such as Getty v Stability AI in the UK or The New York Times v OpenAI and Microsoft in the US.
Notably, while the decision was welcomed by AI system developers and research institutions, it has been met with criticism from some who argue that it is not aligned with established case law regarding making protected content available online (such as Svensson, GS Media, and VG Bild-Kunst), which LAION evidently does to commercial entities.
Case reference: Landgericht Hamburg, judgment of 27 September 2024 (Az.: 310 O 227/23)
See the full judgment (in German) here
Please note that the quotes from the judgment used in this article are not official translations.
[1] Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC.
[2] Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society.
[3] Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act).
[4] Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information (recast)