This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
| 2 minutes read

Copyright infringement class action against Microsoft, Open AI and GitHub concerning Co-pilot AI tool

A class action has been filed against, among others, Microsoft, Open AI and GitHub in the US District Court for the Northern District of California. 

The class action primarily relates to the use of software code hosted in GitHub, an internet hosting repository for software development which commonly includes software code licensed under open-source licences such as GPL and Apache. This software code is said to have been utilised to train GitHub’s "Copilot", a programming tool that enables software developers to quickly generate code by generating coding suggestions from a programmer’s natural language comments. Copilot is powered by OpenAI’s "Codex" artificial intelligence system, which is also included in the action. The Copilot tool is a proprietary tool that requires programmers to sign up to a subscription model to use it.

This appears to be the first US class action challenging the training and output of an AI model and is therefore likely to been keenly watched by developers, data scientists and lawyers in the field.

The press release from the claimants’ lawyers states “By train­ing their AI sys­tems on pub­lic GitHub repos­i­to­ries […] we con­tend that the defen­dants have vio­lated the legal rights of a vast num­ber of cre­ators who posted code or other work under cer­tain open-source licenses on GitHub.” 

The action is likely to involve a number of issues, including the following:

Has copyright in software hosted in GitHub been infringed? In that respect:

  • Has code been copied from GitHub in order to be used as training data for, or incorporated in, Copilot/Codex?
  • Are Copilot coding suggestions derivative works of GitHub code or do they render such derivative works as outputs?
  • Is the use made of code hosted in GitHub permitted under the US doctrine of fair use - for example, because it is transformative and used for a different purpose, namely to train Copilot/Codex to generate code? Certainly there have been instances in the US where text and data mining has been upheld as a fair use.

Do the open-source licence terms included with the code hosted in GitHub prevent the use made of the code by Microsoft/OpenAI and GitHub? For example:

  • Do the terms preclude use of the code (i) as training data or (ii) which results in it being incorporated verbatim in proprietary code such as Copilot/Codex?
  • Have those license terms otherwise been breached – for example, by a failure to attribute the author’s name and copyright in a manner required by the open-source licences?

Do GitHub’s own Terms of Service entitle it to use the GitHub code as it has done? Notably, the lawsuit suggests that the defendants have violated their own Terms of Service. 

The claimant lawyers have noted, “[t]his is the first step in what will be a long jour­ney.” 

We shall keep a close eye on this one.

"This is the first step in what will be a long jour­ney. As far as we know, this is the first class-action case in the US chal­leng­ing the train­ing and out­put of AI sys­tems. It will not be the last. AI sys­tems are not exempt from the law. Those who cre­ate and oper­ate these sys­tems must remain account­able,"


artificial intelligence, brands designs copyright, copyright, technology

Tweets on this subject