This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
| 3 minutes read

The value in data - your business has a lot of data; how do you use it to train an algorithm?

Although we talk about AI systems as being capable of ‘human’ or ‘intelligent’ behaviour, it is generally machine learning that enables machines to exhibit a form of ‘intelligence’. Machine learning uses data and algorithms to imitate human ‘learning’, to identify patterns and gradually improve the accuracy of their predictions. This article looks at how businesses can derive value from their data by using it to train algorithms.

Machine learning looks for patterns in existing datasets which, when applied to a new dataset, allow (hopefully valuable) insights to be drawn. Good quality, high volume datasets are often therefore hugely valuable in training algorithms. Consider, for example, health data collected from cancer patients as part of an extensive research study, looking at survival rates in the context of wide-ranging factors. This data could have enormous scientific, societal and commercial value if utilised in training algorithms for use in clinical settings or for further research on the disease.

Beyond scientific research and any potential commercial value in using data to train algorithms, it is also worth considering the wider value analysing good quality datasets may provide for businesses and society at large. There is potential for machine learning to be used by businesses for environmental gains, for example, by sharing and analysing data relating to optimisation of energy usage. Equally, there is a role for machine learning in promoting equality, for example, by seeking to eliminate bias in recruitment processes.

Of course, factors may reduce the value of data too, by making it harder to realise its value when used to train an algorithm. A complex UK/EU regulatory landscape must first be navigated. Taking cancer research as an example, many legal restrictions will apply, including in the areas of medical confidentiality and data protection. Notably, medical consents will be required and a valid lawful basis must be established under the UK/EU GDPR, in addition to providing patients with appropriate privacy notices.

The ‘research exemption’ under GDPR may be available for some scientific research (noting the recent, draft ICO guidance on this subject). Where the intention is to use a dataset for other purposes, such as product development or commercialisation, which are not covered by this exemption, pseudonymisation or even anonymisation may be the only way forward, which may detract from what an organisation is trying to achieve. Add to this the cost and complexity of an anonymisation exercise (if even possible) and the dataset quickly becomes a lot less valuable. Businesses will also need to be mindful of any contractual constraints on data use and of the additional protections required where data relates to children.

Taking a different example, the internet provides an almost infinite source of information and data, which may be ‘crawled’ with a view to creating databases to train algorithms. While such data may have significant and wide-ranging value, there are evident risks, given that much of the information available online is false, out-of-date and/or inaccurate, too. Even if an organisation can verify the data, there are a number of legal risks in carrying out web-crawling activities, which may diminish the value of any resulting datasets. Data protection is again a primary concern, but the potential for copyright and database infringement exists too, as well as breach of website T&Cs, or even criminal offences under the Computer Misuse Act 1990.

Those looking to extract value from data through the use of machine learning will also need to consider developments in legislation and regulatory guidance specific to the use of AI technologies. Notably, 2021 saw the publication of the European Commission’s proposed AI Regulation, which bans certain categories of AI systems and sets standards for other categories viewed as ‘high risk’. For personal data, the ICO has produced specific guidance regarding how to apply data protection to AI projects, as well as providing a framework for auditing AI.

That said, there are positive developments too, which may boost businesses’ ability to gain insights from their data. The EU’s current ‘European Data Strategy’ seeks to make the EU ‘a role model for a society empowered by data’ and proposes new legislation in the form of the Data Governance Act and the Data Act. The thrust is to foster access to and sharing of data across the EU, an idea also strengthened by the Commission’s White Paper on AI, which details the Commission’s proposals to promote and support the development of AI.

So, what should businesses be doing to get the best value from their data when used to train algorithms? Compliance with applicable legal requirements (of which there are plenty) and awareness of applicable regulatory guidance is of course key. However, given some of the current complexities of the legal landscape, we see many operators adopting a more pragmatic and risk-based approach to compliance in the field of data collation and use. In addition to these legal risks, businesses engaged in utilising data should be mindful that datasets will very often contain flaws and inaccuracies, which may lead to undesirable consequences when training algorithms. It is therefore important for such businesses to validate, cleanse and prepare data before using it for machine learning, so its value is optimised.


value in data, data protection and privacy