5/16/2024 10:02:00 PM | 2 minute read

Generative AI: DPAs ‘supervising’ Gen AI learning

Get in touch

Elisa Lindemann

Associate - IT & digital projects

Get in touch

Elisa Lindemann

Associate - IT & digital projects

For anyone living under a rock for the past year, the use of generative AI tools has continued to spread like wildfire and shows no signs of slowing down.

As ever, new technologies raise new data privacy questions, but perhaps even more so when it comes to ‘large-language models’, such as ChatGPT, given the ‘internet-scale’ datasets used to train them. Data privacy regulators and legislators worldwide are scrambling to find answers, using different approaches and with different results.

The European Commission has established an “EU AI Office the European Data Protection Board launched a dedicated ChatGPT taskforce, and you may remember that the Italian Garante’s initial reaction to ChatGPT was to ban it, albeit temporarily. Closer to home, the UK government is trying to position itself as a leader in all things AI by taking a pro-innovation approach to AI regulation to “unleash the significant social and economic benefits of AI”.

Also, the Information Commissioner has taken a somewhat ‘techno-optimist’ approach in the UK, supporting the government’s vision. However, it’s also warning developers and deployers of AI tools to comply with data protection laws, updating its guidance, and acknowledging that more clarity is needed on specific issues. Earlier this year, it launched a consultation series on what it sees as the key generative AI questions: (1) determining the lawful basis for processing publicly available data to train models; (2) how to comply with the purpose limitation principle throughout the generative AI lifecycle; (3) the application of the accuracy principle to training data and outputs; and (4) data subject rights.

To date, the ICO has only shared its thoughts on the first three topics, and there are a few surprises, with the ICO mainly agreeing with the approach we’ve seen many developers of generative AI take. For example, it confirms that ‘legitimate interests’ is the most appropriate lawful basis for processing publicly available information to train models, as long as sufficient risk mitigations are in place. The ICO also acknowledges that the principle of data accuracy is not absolute and that the need for accurate outputs will depend on the purpose for which the model will be used.

One criticism of the draft guidance is the ICO’s oversimplification of how generative AI models are trained (do the diagrams remind anyone else of pizzas?). Another is that when conducting a legitimate interests assessment, the ICO expects developers of ‘base’ models to consult their crystal balls and anticipate every potential downstream third-party use, which will be very difficult if not impossible, in some cases. Limiting the ‘legitimate interests’ basis to downstream uses of a model that the original developer can foresee risks stifling innovation and thwarting some of generative AI’s potential.

The eagerly anticipated fourth consultation will focus on data subject rights. Compliance with data subjects requests relating to training datasets can prove particularly challenging for developers, as training datasets are usually vast and unstructured, with identifiers soften removed, making it extremely difficult to isolate the personal data of a particular data subject. Even if this were possible, re-training a model each time a developer has to comply with a data subject’s opt-out request would lead to disproportionate and prohibitive efforts
and costs. It will be interesting to see how the ICO addresses this problem. But with so much regulatory scrutiny worldwide, perhaps another regulator will beat them to it.

This article is part of our Data Protection Top 10 2024 publication.