This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
| 3 minutes read

Web crawling and scraping data: what is prohibited “commercial use”?

For commercial entities engaged in crawling the web to harvest data, there are a wide range of legal issues to consider when evaluating the risk associated with these activities. Empirically, in-house lawyers are enlisting support from external counsel to work through these issues. The risk profile can vary somewhat from jurisdiction to jurisdiction. In the UK alone, in-house lawyers considering these risks may find themselves having to grapple with (at least) potential copyright and database right infringement, data privacy, possible offences under the Computer Misuse Act 1990 and compliance with website terms of use. It is the last of these (i.e. website terms of use) in which a particularly vexing question has gained some currency recently, especially – although not exclusively - among commercial entities who seek to harvest input training data for Artificial Intelligence systems.

Website operators frequently include provisions in their terms of use restricting the uses that can be made of their website content. Sometimes these restrictions will be specific in the sense that they refer to a particular act that is prohibited – for example, a prohibition on making copies of content to populate a database. Very often, however, they are cast in general terms and anchored to use that can or cannot be made of the content. A very common prohibition is on using the content of a website for commercial purposes (i.e. a general prohibition on commercial use). For commercial entities who deploy web crawling and scraping programs to harvest data, such terms can be problematic, even assuming that their data collection methods are otherwise compliant with any technical protocols/instructions present on the web server about what can or cannot be crawled/scraped.  

An important but sometimes overlooked question is whether the website terms of use including a prohibition on commercial use are likely to be contractually binding. If the website requires the user to positively confirm acceptance of the terms of use, for example by ticking a box stating “I accept” and provided the other elements of contract formation (consideration, intention to create legal relations etc.) are made out, then the terms should be contractually binding. If there is no such requirement, it is likely that the website operator would seek to rely on a ‘browsewrap’ agreement (where the terms of use typically state that use of the website constitutes acceptance of those terms). However, unless the user has actual or constructive notice of the browsewrap terms of use, it is open to debate whether there is a binding contract with the user – the website owner would have to adduce compelling evidence to demonstrate that this notice requirement has been satisfied.

Assuming that the terms of use are contractually binding, the question arises: what is “commercial use”? In the UK at least, there is no accepted definition of what constitutes a “commercial use” either in legislation or case law. It is, of course, possible that the website operator could include in its terms of use a bespoke definition of “commercial use”. It might also be possible to deduce certain uses which are not intended to be covered by a term generally prohibiting commercial use. For example, the terms of the Creative Commons licence ‘Attribution – Non Commercial 4.0 International’ includes a definition of ‘non-commercial’ being “not primarily intended for or directed towards commercial advantage or monetary compensation”. (see https://creativecommons.org/licenses/by-nc/4.0/legalcode). The obverse conclusion is that ‘commercial use’ in this context would be anything primarily intended for or directed towards commercial advantage or monetary compensation.

If there is no definition of commercial use – which in reality is often the case - things become more complicated. There are more questions than answers which essentially means that there is greater legal uncertainty and it is far more challenging to profile the likely risk. Use by a private individual in a domestic setting or use by a public body for non-profit purposes might be less likely to be considered a “commercial use” by a website operator. However, should activities which merely relate to a commercial entity but have no tangible financial benefit (i.e., revenue and profit) be considered a “commercial use”?  And what if those activities produce an indirect financial gain or a commercial advantage which cannot be reduced to a monetary value? Just how widely should “commercial use” be interpreted? In the majority of use cases, the only time these questions would ever be answered definitively is by a judge once the case has been litigated and the risk has already materialised.

This sometimes leaves those crawling and harvesting data feeling like they are taking a bit of a punt, mitigating the risks as far as possible – for example, by avoiding scraping from websites that require positive acceptance or ‘click through’ terms – and hoping that any residual legal risk will not result in enforcement by the website operator. In a world where data is increasingly being recognised and realised as a valuable asset, the likelihood of enforcement action remains a significant possibility.

...should activities which merely relate to a commercial entity but have no tangible financial benefit (i.e., revenue and profit) be considered a “commercial use”? And what if those activities produce an indirect financial gain or a commercial advantage which cannot be reduced to a monetary value? Just how widely should “commercial use” be interpreted?

Tags

artificial intelligence, commercial and technology, brands designs copyright, health tech, it and digital