The European Commission has published the first draft of the General Purpose AI Code of Practice.
The Code provides guidance for providers of general purpose AI models and general purpose AI models with systemic risk as to how to comply with the obligations set out in Articles 53 and 55 of the EU AI Act1 (the Act), which are due to come into force in August 2025.
Part II of the Code sets out rules for providers of general purpose AI models. We examine key aspects of these below.
Article 53 of the EU AI Act - obligations for providers of general purpose AI models
To recap, Article 53 of the Act provides that providers of general purpose AI models shall:
- draw up and keep up to date the technical documentation of the model for the purpose of providing it, upon request, to the AI Office and national competent authorities (Article 53(1)(a));
- draw up, keep up to date and make available information and documentation to providers of AI systems who intend to integrate the general purpose AI model into their AI systems (Article 53(1)(b));
- put in place a policy to comply with Union law on copyright and related rights, and in particular, to identify and comply with - including through state-of-the-art technologies - a reservation of rights expressed pursuant to Article 4(3) of the DSM Directive2 (Article 53(1)(c)); and
- draw up and make publicly available a sufficiently detailed summary about the content used for training of the general purpose AI model, according to a template provided by the AI Office (Article 53(1)(d)).
Article 53(4) provides that providers of general purpose AI models may rely on codes of practice drawn up at an EU level to demonstrate compliance with these obligations, until a harmonised standard is published.
Transparency requirements
Information required
The Code sets out a table of the documentation/information that is required for the purposes of Articles 53(1)(a) – (b) of the Act. The Code draws upon Annexes XI and XII of the Act, which set out minimum information that must be included for these purposes.
Following the provisions of the Act, different information requirements apply depending on whether the information is intended to be drawn up and kept for the purpose of providing it to the AI Office and national competent authorities upon request, or made available to downstream providers. By way of overview:
- some information is required in all cases, including, for example, general information about the provider and the model itself, the intended tasks and type and nature of AI systems in which it can be integrated, details of the model architecture and number of parameters, input and output modalities, the technical means for integrations into AI systems, and information on data used for training, testing and validation;
- some technical information is only required for downstream providers who intend to integrate the general purpose AI model into their AI systems, for example information about how the model interacts with external hardware or software, and versions of relevant software; and
- broader information is required for the AI Office and national competent authorities. This includes “greater detail” about the model architecture, information about the design specification and training process, further information about training, test and validation data, and information regarding computational resources, energy consumption and testing process and results.
The distinction between the second two categories above seems to arise in part from a recognition of the importance to general purpose AI model providers of protecting intellectual property rights and proprietary information – with certain categories of commercially sensitive information only being required to be drawn up and kept up to date for the AI office and national competent authorities, and not for downstream providers.
Information on training data
There is some ongoing debate around training data and transparency. For the provisions regarding information that is required on data used for training, testing and validation, signatories to the Code are required to detail information including data acquisition methods (such as web crawling, data licensing, synthetic data etc), data processing and specific information regarding the data used to train/test/validate the model. There is a correlation between this obligation and the obligation under Article 53(1)(d) of the Act for providers of general purpose AI models to draw up and make available a detailed summary about the content used for training of the general purpose AI model.
Copyright rules
Copyright policy
To satisfy the obligation at Article 53(1)(c) of the Act to put in place a policy to comply with Union law on copyright and related rights, the Code provides that signatories must comply with the following:
- Draw up and implement a copyright policy: The policy should comply with EU law on copyright and related rights in line with the relevant provisions of the Code, and it should cover the “entire lifecycle” of the general purpose AI model. The Code provides that responsibility should be assigned within the organisation for implementing and overseeing the policy.
- Upstream copyright compliance: Providers of general purpose AI models must undertake reasonable copyright due diligence before entering into a contract with a third party for the use of data sets for the development of a general purpose AI model. Effectively, providers are required to interrogate the third party on how it has identified and complied with any rights reservations expressed pursuant to Article 4(3) of the DSM Directive (see below). This places the onus on the providers of general purpose AI models to ensure that rights reservations (opt outs) are complied with.
- Downstream copyright compliance. Providers of general purpose AI models (apart from SMEs) must implement reasonable measures to mitigate the risk of a downstream system or application into which a general purpose AI model is integrated generating output that infringes copyright. In particular, they should:
- avoid “overfitting” their general purpose AI model - overfitting occurs where a model is too closely fitted to the training data, such that it becomes difficult for the model to make generalisations; and
- require any third party entity to whom they are providing the general purpose AI model to agree to take measures to avoid repeatedly generating output which is identical or recognisably similar to protected works (i.e., at risk of infringing copyright).
So general purpose AI model providers will be required to take active steps in both the development of the model upstream, and in the contractual terms downstream on which it is made available, to prevent downstream systems or applications from generating content that infringes copyright.
- avoid “overfitting” their general purpose AI model - overfitting occurs where a model is too closely fitted to the training data, such that it becomes difficult for the model to make generalisations; and
Complying with the limits of the TDM exception
Signatories to the Code must commit to ensuring that they have lawful access to copyright protected content and to identify and comply with rights reservations expressed pursuant to Article 4(3) of the DSM Directive.
To recap, Article 4(1) of the DSM Directive provides an exception to copyright for reproductions and extractions of lawfully accessible works for the purposes of text and data mining. However, Article 4(3) of the DSM Directive provides that the exception only applies if the use of the relevant works has not been expressly reserved by rightholders in an appropriate manner. In the case of content made publicly available online, this “appropriate manner” includes by machine readable means. There are however, widespread concerns and uncertainty around precisely how a machine-readable opt out can be effected. In a recent decision, the Regional Court of Hamburg suggested that the plain text language of website terms of use might be machine readable. However, reading the plain text is one thing, but understanding precisely the meaning of the words used is altogether more challenging. It seems likely that a clear and intelligible standard will need to be developed for this purpose.
The Code itself provides that signatories must comply with the following:
- Respect Robots.txt. Providers of general purpose AI models should only use web crawlers that read and follow instructions expressed in accordance with the Robot Exclusion Protocol (robots.txt). The Protocol is commonly used by webpages wishing to deny access to crawlers. It is more of an ‘architectural courtesy’ rather than a technical measure and not all crawlers respect the Protocol. There have been reports in the press about AI companies bypassing measures taken to restrict crawling.
- Findability. Providers of general purpose AI models that provide an online search engine (as defined under the Digital Services Act) or control such a provider are required to take appropriate measures to ensure that any crawler exclusion under the Robot Exclusion Protocol does not negatively affect the ability to find that content in their search engine. This is presumably designed to ensure that websites are not unduly penalised for using robots.txt to exclude crawlers.
- Rights reservations. Providers of general purpose AI models must make best efforts in accordance with “widely used industry standards” to identify and comply with other appropriate machine-readable means to express a rights reservation pursuant to Article 4(3) of the DSM Directive for publicly available online content. However, the Code goes on to state that signatories (apart from SMEs) will, where invited to do so by the Commission, engage in discussions with rightsholder representatives to develop interoperable machine-readable standards to express a rights reservation pursuant to Article 4(3) of the DSM Directive, and to identify and comply with such a reservation.
These two separate obligations highlight a tension. Although the Code refers to widely used industry standards, it is unclear exactly what standards are meant here – or indeed whether there are any - and the Code envisages that collaboration may be required to develop rights reservation standards. This is an area over which rightsholders and providers of general purpose AI models alike seek clarity.
- No crawling of piracy websites. Providers of general purpose AI models will take reasonable measures not to crawl pirated sources.
Copyright transparency
In addition to the above measures, signatories to the Code are required to commit to transparency regarding the measures they adopt to comply with EU law on copyright and related rights. The Code provides that signatories must comply with the following:
- Information about rights compliance. Providers of general purpose AI models must make public adequate information about the measures they adopt to identify and comply with rights reservations expressed under Article 4(3) of the DSM Directive. This needs to include the name of all crawlers used for the development of a general purpose AI model and their relevant robots.txt features, including at the time of crawling.
- Single point of contact/compliant handling. Providers of general purpose AI models are encouraged to designate a single point of contact to enable rightsholders to communicate with them electronically, and to enable rightsholders and their representatives to lodge complaints regarding the use of works for the development of a general purpose AI model. This is interesting given recent steps taken by rightsholders and collective management organisations to object to model training – an example is seen in the Statement on AI Training that has, as at the time of writing, been signed by over 35,000 signatories.
- Documentation of data sources/authorisations. Providers of general purpose AI models must draw up, keep up to date and provide the AI Office upon request with information about data sources used for training, testing and validation, and about authorisations to access and use protected content for the development of AI. As mentioned above, it is interesting to see this repeated focus on training data in the Act and the Code and it is an area that copyright owners will be particularly invested in.
Next steps
The draft Code is generally high level and does not contain the level of detail that is expected in the final Code. It is envisaged that the Code will be further discussed in the coming months, with the final document expected in May 2025. We will continue to monitor for updates as the Code is refined and updated.
=========================
- Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act) (Text with EEA relevance)
- DIRECTIVE (EU) 2019/790 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC
and 2001/29/EC