This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
| 8 minutes read

Integrating generative AI into the life science ecosystem – five tips for success

This article is part of our Biotech Review of the Year - Issue 11 publication

Readers will be well aware of the explosion in the use of generative AI tools that has taken place over the past year, with ChatGPT quickly becoming a household name.

Generative AI has a lot of promising applications in the life sciences sector ranging from assisting in drug discovery and design, through to preparing labelling and patient information. It is worth noting that generative AI is not the only type of AI being developed for use in the life sciences sector. For example, a study conducted by the Royal Marsden Hospital and the Institute of Cancer Research recently showed that using a non-generative AI tool in combination with radiomics was nearly twice as accurate at diagnosing a rare cancer than using existing methods. However, for the purposes of this article, we will be looking at generative AI only.

In this article, we take a look at five of the key issues for life sciences organisations considering incorporating or leveraging generative AI into their products and processes and how the AI market is currently addressing each of them.

1. Protecting confidential inputs

Generative AI tools are often fine-tuned by vendors using customers’ input data to improve the tool and make the outputs iteratively more reliable and accurate. While some AI vendors’ contracts are clear that the customer will own all prompt and input data, many vendors also ask for a perpetual licence to use all customer inputs to improve the tool. Depending on what type of product or process the tool is embedded into, there is a risk that granting a licence to that data could be considered a disclosure of confidential or proprietary information (for example, where it contains confidential medical data or any trade secrets). For an organisation looking to use an AI tool to generate novel molecular structures, if the tool is trained on an organisation’s confidential data on which the structures are based on, there is a risk the tool is able to reproduce the same structures for competitors which could undermine the original company’s competitive advantage of using the tool.

Before embedding an AI tool into a product or process, you should engage with the relevant internal teams to understand what data will be exposed to the tool and whether any inputs will contain confidential information or information relating to patients. While we are seeing a number of larger AI vendors design their platforms to allow customers to easily opt-out from further use of input data, we expect vendors of more bespoke solutions in the life sciences sector to be slower to offer similar opt-outs.

Where confidential information or patient data will be uploaded and no opt-out is available, we are also beginning to see some AI vendors offering the possibility of running a dedicated instance of the AI model in the customer’s environment which can help you keep control over particularly sensitive data. This of course has a cost consequence compared to public instances but may be justified for highly confidential projects or where patient data is involved.

2. Maintaining patient privacy and security

Information processed using generative AI tools in a life science context may well include personal data. For example, where a generative AI tool is used to analyse clinical trial data, this will include a large amount of patient personal data, including extensive health data which is considered special category data under the GDPR. You should carefully consider how you can use the AI tool to process personal data in accordance with data protection rules and any additional rules around patient confidentiality (such as the Caldicott Principles when processing NHS data).

The Information Commissioner’s Office has made it clear that a data protection impact assessment should be carried out before using generative AI to process personal data. This should document and address the legal basis for processing personal data (and the condition for processing health data), how you will address transparency obligations and will help you work through issues surrounding purpose limitation, data minimisation and accuracy (including bias).

Another consideration when processing clinical trial data using an AI tool is data security. While some of the bigger AI vendors are now offering a relatively strong set of security commitments, it is common for smaller AI vendors to offer no specific commitments around security. This, coupled with the highly sensitive nature of the data being stored, represents a significant risk to customers in the life sciences sector. In the absence of any contractual commitments, you should investigate whether one of the major cloud providers hosts the vendor’s services, as this may be a form of practical comfort given their approach to privacy, security and confidentiality. Alternatively, as was discussed in the context of protecting confidential information, running a dedicated instance of the AI model, hosted on your servers, may be the best way to get comfortable with the risks.

3. Ownership of intellectual property in outputs

The main output of generative AI, as the term suggests, is the content it generates. While the outputs may not be particularly valuable if it is being used in a way that is ancillary to your primary products or processes (for example, to generate project management information more quickly), where it is being used directly in the drug discovery process, the outputs can attract significant value. The Benevolent platform, for example, is an AI-enabled platform being used by a number of major pharmaceutical companies to identify targets or to conduct target screening more effectively. In both cases, the outputs are likely to be very valuable to the pharmaceutical company involved.

Any life sciences organisation using this type of platform will likely want to own all outputs. It is therefore crucial that you review the vendor’s platform terms to ensure they reflect that position before the platform provider is being engaged. Likewise, if an AI tool was being “stacked” on top of your existing product or process, for example to enhance medical images that you had already collected, you would need to ensure that you own both the image and the enhancement to ensure you are free to use the outputs as you see fit.

It is worth noting that generative AI is challenging the boundaries of both copyright and patent protection and there are questions around whether IP, including patents and copyright protection, subsist in generated content. In the US, the Copyright Office continues to deny protection to art generated by AI and there is global inconsistency around whether a patent can be granted for an invention made using AI. In short, you should tread carefully if you are entirely reliant on having exclusivity over generated content.

4. Avoiding use of outputs that infringe third party IP rights

As generative AI models are trained on large volumes of data (often publicly available), there is a risk that IP infringement claims could be brought by third party rights owners against users of generative AI tools where outputs infringe their rights. This risk is higher where a model relies on publicly available data, such as a catalogue of scientific literature, and has not been developed in a way that prevents it from regurgitating data in the training dataset. 

From what we have seen, very few AI vendors offer protection, for example, by providing an IP indemnity to its customers. While Microsoft has recently announced it will take responsibility if its customers are sued for copyright infringement for using its Copilots or the outputs they generate, it is notable that this protection does not extend to other types of IP (such as patents and trade marks). As such, we do not expect to see many AI vendors in the life sciences sector offering this kind of protection, particularly where it is possible for the customer to carry out freedom to operate or brand clearance searches to mitigate the risk themselves.

In any case, there are a number of ways that you can guard against the risk of a dispute arising. Understanding how the AI model was trained and how it works can help you assess the risk. Where the training dataset contains a large amount of publicly-available proprietary information (for example, if the AI model is trained on a database of existing patents in the field), you should be more alert to this risk. In contrast, where the AI tool is only being used to enhance internal workflows (such as for internal data visualisation or to summarise clinical trial data) the risk of infringing third party IP is far lower. Having internal policies in place to check any outputs are novel (for example, by conducting freedom to operate searches), will be essential in reducing the risk of a drug candidate being developed in breach of third party IP and avoiding costly patent litigation.

5. Ensuring accurate and high quality outputs

In order to produce high quality, reliable outputs, generative AI needs to be trained on a large quantity of data that is representative of the problem it is solving. For example, where an AI tool is being used to identify whether a CT scan shows an abnormality, it needs to have been trained on a large database of CT scans that have been accurately diagnosed. Similarly, in the example above of using an AI tool designed to generate a molecular structure for potential drug candidates, the AI tool must be given a deep understanding of the underlying chemistry and biology (such as a high quality dataset of molecular structures with known biological activities) if the generated molecules are to have the desired properties.

The current state of the AI market indicates that vendors are very reluctant to offer strong contractual commitments relating to performance or the quality of the outputs. The argument here is that the customer is best placed to assess whether the outputs meet their needs. Where you plan to make use of an AI tool that will have an impact on patients, there is a clear risk that overreliance on generated outputs may result in harm to the patient, for example through an incorrect diagnosis.

Before implementing an AI tool into an existing product or process, you should undertake extensive due diligence on the AI tool. The vendor should be able to give you assurances around how the model was trained, the nature of the data (in terms of volume, quality and source) and how the model generates outputs. This will help you assess whether the model is suitable for your use case. You should also consider whether the vendor is willing to develop a proof-of-concept using your own proprietary data so that you can ensure it will perform as expected, before committing to a full agreement. Unlike in the technology sector, where we are seeing a few larger vendors offering broad and adaptable functionality, we expect to see lots of vendors in the life sciences sector offering specific AI tools addressing specific use cases so selecting the right one at the start of the process is key.

Bias is also a key consideration in a healthcare context. Where the AI model is not trained on data that represents the population the product or process will ultimately be used on, there is a risk of bias being introduced, leading to the possibility of discrimination against certain categories of individuals. Internal teams managing your products and processes need to be alive to this and ensure that when an AI tool is embedded, there are rigorous checks to ensure the outputs deliver what your products and processes need.

This article is part of our Biotech Review of the Year - Issue 11 publication


biotechreviewoftheyear_11, biotechreview11_innovation, article