Reports abound in the media about AI companies entering into licensing deals with organisations including media organisations and other image, video and content libraries to enable the use of content to train AI models.
These deals are being negotiated against a backdrop of uncertainty and litigation regarding the use of works for AI model training, and in the face of action from rightsholders to prevent content being used without consent. AI developers are increasingly looking to secure access to content, and to partner with third parties, amid reports of a decrease in the availability of publicly available material to train AI models.
AI model training and the use of IP protected content
The process of training AI large language models requires access to enormous amounts of content - including text, images, video and data. To date, many AI systems have been trained on publicly available content, with AI developers scraping the internet in order to train models.
However, issues arise where content protected by IP rights including copyright is used without consent. There is a raft of litigation in this area in the US, while in the UK, Getty brought a case last year against Stability AI alleging infringement of its IP rights in relation to the training of the automatic image generator Stable Diffusion.
Aside from litigation, rightsholders are increasingly putting in place measures to prevent their works from being used without consent, including by setting up paywalls, changing terms of service and using robots.txt files on websites to block AI bots. On an EU level, the EU AI Act clarifies that providers of general purpose AI models will need to obtain authorisation to carry out text and data mining - a technique that may be used extensively in model training - where a rightsholder has reserved their rights to opt out of certain text and data mining – please see our article on this here.
Licensing deals
Amidst these developments, developers have been looking to contract with organisations to license content for AI model training.
The most widely reported deals in this space to date are with news publishers, and deals have been reported with a host of international news organisations including Associated Press, Axel Springer, Le Monde, the Financial Times, News Corp, Vox Media, the Atlantic, Time and Conde Nast. It isn’t just news publishers that are signing, however, and deals have also been reported with companies including Shutterstock and Reddit. Separately, there is a growing number of companies set up to license content for AI model training.
Developers will hope that these agreements can secure access to content and prevent litigation, while giving them a competitive edge against their rivals. Depending on the nature of each specific deal, they may also hope to leverage the brand power of the partnering organisations. There can also be practical benefits to taking a licence – for example, being able to stipulate the format of content.
Content providers will hope to monetize their IP, and to set clear parameters around what developers may use and for what purpose. They may also hope to direct traffic to their own sites and potentially to use the relevant AI technology for their own products and services. Given the uncertainties regarding the legal position, and the developing nature of the technology, we suspect there will continue to be debate among rightsholders as to whether to license, and the form this should take.
In the UK, the Society of Authors and the Creators Rights Alliance have both recently written open letters (see here and here) to AI developers to assert their rights. In both of these cases, they urge developers to agree terms with rightsholders, and point to the fact that licensing opportunities exist and are being developed.
It will be interesting to see how the market for licensing advances, and in particular how efforts to put in place collective licences may play out. As with everything AI the situation is developing at pace, and we look forward to seeing how this evolves.