Intellectual property is one of the main issues for everyone involved in the burgeoning AI ecosystem. The large generative AI model developers, have been coming under scrutiny, and in some cases legal claims, regarding their use of copyright content to train their models. Simultaneously, organisations using large models are concerned about whether intellectual property subsists in content generated by these tools.
So what does the AI Act have to say about intellectual property? The Act was not, at least at the beginning of its journey, intended to regulate copyright. However, as the proposal advanced, large language models developed exponentially, and pressure grew to address some of the copyright concerns they gave rise to.
So where did we end up? There are three aspects of the approved AI Act which are particularly relevant to copyright.
Text and data mining and training general purpose AI models
Under Article 4(3) of the Digital Single Market Directive, it is permissible to make copies of lawfully accessible works for the purposes of text and data mining, including commercial purposes. It is, however, open to copyright holders to “opt out” of this exception. To do so, the copyright holder is required to expressly reserve the right of text and data mining to themselves. Exactly how the copyright holder does this is less clear. The DSM Directive suggests that it should be in an “appropriate manner”, such as using machine readable means.
Prior to the final version of the AI Act, there had been some uncertainty and debate about whether the text and data mining exception could apply to acts of copying copyright works in order to train general purpose AI models. This is for two reasons.
Firstly, text and data mining on the one hand, and AI model training on the other, are not the same activity. Text and data mining (defined in the DSM Directive as “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”) typically involves scraping data, extracting relevant data and pre-processing it. AI model training generally involves taking a selected model type, applying it to a training data set and then making the necessary adjustments to fine tune the model. Perhaps a sensible way of looking at this issue is to think of text and data mining as a necessary prelude to AI model training.
Secondly, it has been questioned whether or not the EU legislators had the development of general purpose AI models in mind when framing the text and data mining exception in Article 4(3) of the DSM Directive (which was finalised in 2019).
The recitals of the AI Act appear to dispel the uncertainty.
Recital 105 clarifies that text and data mining techniques may be used in order to retrieve and analyse the vast amounts of text, images, videos, data etc. that are required to train general purpose AI models, and that this typically requires the authorisation of the copyright holder, unless a copyright exception applies. So the recital expressly acknowledges that there is a nexus between the use of copyright works and training general purpose AI models. Recital 106 also expressly references Article 4(3) of the DSM Directive by requiring providers of general purpose AI models to put in place a policy to ensure that they comply with any reservation of rights under Article 4(3), and this is reflected in Article 53.1(c) of the AI Act.
Accordingly, whether or not the EU legislators had the development of general purpose AI models in mind when framing the text and data mining exception in Article 4(3), the AI Act now appears to put this issue to bed.
Transparency in relation to the use of copyright works in training materials
One of the biggest challenges for copyright holders with general purpose AI models that the AI Act seeks to address is transparency. The challenge is this: if copyright holders do not know whether their copyright works have been used to train a general purpose AI model, enforcing their copyright against the developer of that model is much more challenging.
Article 53.1(d) of the AI Act imposes an obligation on the developer of a general purpose AI model to draw up and make publicly available “a sufficiently detailed summary about the content used for training of the general purpose AI model” in accordance with a template provided by the EU AI Office. The nature of this “sufficiently detailed summary” is elaborated upon in recital 107. It should “be generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights…for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used.” There had been some concerns with earlier proposals that too much detail would be required from general purpose AI developers, making compliance impossible and the provision unworkable.
As it now stands, the obligation should – subject to the EU AI Office template once issued - enable developers of general purpose AI models to provide a relatively high level explanation of their data sources that enable copyright holders to determine whether they are “lawfully accessible” data sources, that include their copyright works. It seems to assume, however, that the copyright holder will already know whether their work is included in a particular data source, which may not be the case.
There is a balancing act going on here. Copyright holders want as much detail as possible to make it easier for them to enforce their rights. Conversely, AI developers want reassurance that they will not face a slew of lawsuits that will make their operations unviable.
Long arm jurisdiction for copyright
Article 53.1(c) and recital 106 of the AI Act make it clear that providers who place general purpose AI models on the EU market are required to ensure compliance with the AI Act and implement a policy for doing so, including text and data mining exception in Article 4(3) of the DSM Directive (by using state of the art technologies).
Recital 106 goes on to state that “Any provider placing a general-purpose AI model on the Union market should comply with this obligation, regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of those general-purpose AI models take place”. This is very much a “long arm” jurisdiction. It applies the compliance obligation to providers of general purpose AI models who are located outside of the EU and it does not matter where the training of the model took place or what the copyright laws of those countries are. The justification for this approach is to ensure that no provider of a general purpose AI model gains a competitive advantage within the EU by applying a lower standard of copyright.
Not so long ago, the EU legislators introduced a long arm jurisdiction under the GDPR. So it is perhaps unsurprising to see a similar approach taken in the AI Act. Of course, it may be desirable to have a uniform standard of copyright concerning general purpose AI models and the EU is clearly looking to seize the initiative in the AI Act. However, it does stretch the limits of international comity and is one of the more controversial copyright aspects of the AI Act.
To hear more from our experts on AI, visit our dedicated page here and register now for our Tech Summit 2024! |