Skip to content
data science
pharma

Federated Learning in the pharmaceutical industry

a secure and scalable approach to data science

Federated Learning in the pharmaceutical industry – a secure and scalable approach to data science
Federated Learning in the pharmaceutical industry – a secure and scalable approach to data science

AI has been reshaping the healthcare industry, from diagnostics and clinical decision-making to surgery. However, for pharma companies, harnessing its full potential often comes with a major hurdle – data privacy and regulatory compliance. 

Centralised AI systems, which rely on vast amounts of patient data, create a significant challenge. While AI thrives on this data, medical records and clinical trial results must be rigorously protected. Without a unified protocol for data encryption and sharing in AI research, pharma companies struggle to meet strict privacy regulations. This lack of standardisation in centralised systems makes it difficult to implement AI models effectively. 

That’s where federated learning comes in as a viable option, offering a way to train AI models without compromising data security or privacy.

How does Federated Learning differ from traditional AI approaches?

Federated Learning (FL) is a decentralised approach where multiple parties collaboratively train a single deep learning model, without sharing raw data. In the pharma context, this means that each company trains a model on the company’s owned, private data.These updates are combined with other parties’ data to improve the model, without exposing sensitive patient information, enabling privacy-preserving data analysis.

Meanwhile traditional (i.e., centralised) AI works by collecting and storing large amounts of data from multiple sources in one central location. The data is then used to train AI models, which are processed and updated in a central server. While it can draw powerful insights, it requires the highest possible data security and privacy measures to protect sensitive information.

As it poses fewer security risks, FL is now becoming a popular option for companies processing vast patient and clinical data.

Why Federated Learning Matters for the Pharmaceutical Industry

Let’s now take a deeper dive into the reasons why federated learning in the pharmaceutical industry is so important: 

To speed up drug development

AI and machine learning are becoming increasingly important in drug discovery, helping to improve the accuracy of models. However, many pharmaceutical companies face challenges because they don’t have enough data within their own organizations. 

For AI models to work well, they need data that’s specific to diseases, and often this data is spread across different organizations. Collaborating with other companies to share data could help create more accurate and reliable models, especially when studying rare or complex diseases, but there is a hook. Most companies are afraid to share their organization’s IP or competitive advantage. Federated learning for drug discovery enables multiple organizations to train AI models collaboratively without sharing sensitive data, ensuring privacy while leveraging a broader dataset for better accuracy.

To abide by regulations

Sharing data in the pharmaceutical industry is complicated by regulations like GDPR, HIPAA, and other industry-specific rules. Many companies are cautious about sharing their data because of concerns over protecting their intellectual property and keeping a competitive edge. Projects like MELLODY have tried to bring companies together to share data, but regulatory challenges can slow down the process and limit the number of experiments that can be done. HIPAA-Compliant Data Processing and GDPR-Compliant Data Processing are essential for ensuring that data collaboration meets regulatory standards.

To overcome the limitations of traditional machine learning

Traditional machine learning methods have limitations, especially when data is stored in one place. This can lead to security risks and make it difficult for organizations to work together. Federated learning is a solution to these problems, allowing collaborative data analysis in pharma without sharing sensitive data. This method helps keep data secure and ensures compliance with regulations, making it a promising tool in drug discovery, particularly for diseases where data is scarce.

How Federated Learning Works in a Pharma Context

FL in pharma is often referred to as “coopetition”. For the benefit of all parties, (often rival) pharmaceutical brands collaborate by combining insights from various datasets. At the same time, they keep access to other servers and raw data confidential.

Each pharma brand trains their models on the data locally. Next, the model updates are encrypted and sent to a central server. 

Here’s a visualisation of how they work based on the above-mentioned MELLODY:

Let’s put this into one of the most frequent use cases, i.e., drug research.

The company could collect encrypted data from different hospitals or research institutions (like patient records, medical imaging, or clinical trial results) and send it to a central server. The server then combines and averages the information to improve the model’s predictions without revealing any personal patient details. This would allow the company to improve its drug development models, or ideate personalized treatments based on various patient segments. Let’s look at this and other scenarios further.

Real-World Evidence (RWE) and Federated Learning in Pharma

Drug Discovery & Preclinical Research

The drug discovery process involves complex challenges, particularly understanding how molecules and proteins interact in the human body. AI plays a crucial role in improving the efficiency of this process, but biopharma companies often face limitations due to narrow datasets. Access to broader data could lead to more breakthroughs, but data sharing is not feasible due to intellectual property (IP) concerns.

Federated Learning (FL) offers a solution by enabling organizations to collaborate on improving AI models without directly sharing sensitive data. Rhino Health’s Federated Computing Platform (FCP) builds on this, allowing companies to collaborate on drug discovery while preserving IP confidentiality. The FCP also includes Federated Validation and Federated Preprocessing, which are key in drug discovery.

With FCP, biopharma companies can enhance AI-driven drug discovery, access more diverse datasets, and improve compound screening, all while maintaining data security and IP protection.

Clinical Trials & Patient Data Analysis

Federated learning is making significant strides in healthcare, particularly in clinical trials and patient data analysis. The HealthChain project, for example, uses federated learning across four hospitals in France to predict treatment responses for breast cancer and melanoma patients. Additionally, the Federated Tumour Segmentation (FeTS) initiative, involving 30 healthcare institutions worldwide, applies federated learning to improve tumor segmentation. These projects show how federated learning enables collaboration without compromising patient data privacy.

Federated learning in clinical trials allows multiple organizations to work together, analyzing diverse patient datasets without sharing sensitive information. This enhances research while ensuring compliance with data protection regulations, enabling more accurate and inclusive outcomes.

As aptly put by Jason Martin, Principal Engineer at Intel Labs:

“Federated learning has tremendous potential across numerous domains, particularly within healthcare. Its ability to protect sensitive information and data opens the door for future studies and collaboration, especially in cases where datasets would otherwise be inaccessible.”

Personalized Medicine & Predictive Analytics 

Federated learning has shown promise in personalized medicine and predictive analytics, especially when it comes to maintaining patient privacy. For example, researchers have successfully used it to differentiate healthy brain tissue from cancerous tissue in MRI images and to predict clinical outcomes for COVID-19 patients based on medical records and X-ray images. In drug discovery, scientists apply federated learning to predict chemical structures and properties, benefiting from the straightforward numerical format of chemical descriptors.

However, challenges remain, such as integrating data from different experimental methods used to determine the properties of drugs. In cell profiling for drug discovery, methods like Cell Painting capture how cells react to drugs, and federated learning can help analyze this data without sharing sensitive patient information. 

By maintaining privacy, FL helps develop personalized treatments and more accurate predictions for individual patients while also respecting their data security.

Challenges & considerations for implementing Federated Learning

While the benefits are clear, it’s also important to be aware of the challenges and specific conditions that need to be met to make the most of Federated Learning in pharma. These are:

Infrastructure & technical complexity

Establishing decentralised training frameworks calls for significant computational resources and sophisticated technical expertise. Integrating FL with existing IT and AI infrastructures in pharmaceutical settings can also be complex. 

Pharmaceutical companies should assess if they genuinely have the technical expertise and resources to implement FL frameworks. Before starting, they should validate if they can accommodate for the costs and assess the expected ROI from federated learning implementations.

Communication overload

This challenge stems from frequent data exchanges between clients and the central server. To tackle this, pharma brand should consider solutions like compressing data and only sending the important outputs to reduce the data size. 

Also, techniques like model pruning and quantization shrink updates, boosting efficiency while keeping the model’s performance intact.

Risk of non-IID data

Problems may arise if the devices or systems involved have different configurations and when the data distributed across them isn’t identical. When each device (or “node”) holds differently distributed data, it reduces the model’s overall performance and leads to non-IID (Independent and Identically Distributed) data.

Training a model on such data can introduce bias, causing it to favour patterns specific to the local datasets. To address this issue, researchers have developed strategies like FedProx, FedYogi, and Scaffold. These methods improve FL’s ability to work with non-IID data, ensuring the model generalises better across diverse datasets.

Risk of data & model poisoning and backdoor attacks

These attacks can compromise the accuracy and reliability of AI models, potentially leading to incorrect drug predictions, faulty diagnostics, or unsafe clinical decisions. 

For pharma, this jeopardizes the trustworthiness of AI-driven research, endangers patient safety, and may lead to non-compliance with regulatory standards. Identifying and mitigating these attacks is crucial to ensure the integrity of models used in critical healthcare applications.

The role of technology providers in FL adoption

While Federated Learning offers clear benefits for the pharmaceutical industry, implementing it at scale comes with its own set of challenges. These include ensuring robust infrastructure, maintaining compliance with data protection regulations like GDPR and HIPAA, and managing the complexity of decentralised AI systems.

Success often depends on deep expertise in areas such as encryption, secure data sharing, and distributed computing. For many organisations, navigating this evolving field requires careful planning and collaboration—both within internal teams and with trusted partners across the ecosystem.

As the landscape continues to mature, Federated Learning holds great promise for those ready to invest in secure, privacy-preserving innovation.

Passion And Execution

Who We Are

At Holisticon Connect, our core values of Passion and Execution drive us toward a Promising Future. We are a hands-on tech company that places people at the centre of everything we do. Specializing in Custom Software Development, Cloud and Operations, Bespoke Data Visualisations, Engineering & Embedded services, we build trust through our promise to deliver and a no-drama approach. We are committed to delivering reliable and effective solutions, ensuring our clients can count on us to meet their needs with integrity and excellence.

Contact us.

Let’s talk about your project needs. Send us a message and will get back to you as soon as possible.