Skip to content

Real-time genomics intelligence platform 

Revolutionising the way scientists explore genomics data, our project aimed to develop a powerful, cloud-native platform for real-time analysis of genomics signals combined with population-based characteristics. This solution enabled researchers to rapidly construct cohorts, perform case-control comparisons, and generate insights in real-time, transforming the landscape of drug target identification. 

About the client 

The client is a multinational, science-led biopharmaceutical company focused on developing life-changing medicines. Due to confidentiality agreements, the client’s name remains undisclosed (NDA). With a global reach and a commitment to innovation, the client is dedicated to pushing the boundaries of medical research and development. 

The challenge of revolutionising real-time genomics data analysis 

  In the era of precision medicine, the ability to rapidly explore genomics data at scale has become crucial. Traditional methods of genomics data analysis are often hampered by lengthy processing times, limited real-time insights, and complex data orchestration. The client sought to overcome these barriers by building a software solution that could accelerate the exploration of genomics signals within large datasets while seamlessly integrating patient population characteristics. 
Researchers required the capability to quickly define patient cohorts, perform real-time case-control comparisons, and visualise complex genomics interactions without delays. The goal was clear: transform raw genomics data into actionable insights in minutes rather than days.  

Our role in the project 

To meet this challenge, our team delivered a comprehensive, cloud-native platform that provided end-to-end services, including: 

Solution architecture design

Development of a scalable, cloud-native architecture tailored to genomics data analysis at scale. 

Backend development

Creation of REST APIs in a microservices structure using Python, enabling large-scale data ingestion, processing, and real-time cohort comparison. 

Data orchestration development

Implementation of a workflow orchestration layer leveraging Nextflow CLI, AWS Step Functions, Lambda, Batch, and messaging systems (SQS/SNS) to automate complex data pipelines.

Frontend development

Design and delivery of a user-friendly web interface built with React, allowing scientists to define parameters, launch queries, and visualise results dynamically. 

Cloud infrastructure deployment

Deployment of a fully serverless and containerised solution on AWS, ensuring scalability and repeatability across environments through Infrastructure as Code (IaC) principles. 

DevOps and automation

Setup of CI/CD pipelines, automated functional and unit tests, monitoring, and cloud cost optimisation strategies.

Integration with laboratory environments

Automated data ingestion from lab instruments to AWS using Storage Gateway and event-driven processing. 

Through close collaboration with stakeholders and scientists, our engineers ensured the platform met both research needs and enterprise standards. 

Key achievements 

1

Real-time data analysis

Approximately 0.5 million rows with 40,000 columns are searched and filtered in real-time through the backend REST API. 

2

Significant time reduction

Waiting time for statistical comparisons between case and control groups was reduced from 2 days to just 2 minutes.

3

Recognition and expansion

The project became a role model for other client projects, leading to new assignments and internal awards, including the Highlight of the Year 2021

Impact and results 

The solution revolutionised genomic data exploration, enabling rapid hypothesis testing in the early stages of drug discovery. Researchers can now quickly construct patient cohorts, perform real-time statistical analyses, and visualise complex data interactions, accelerating R&D processes while reducing costs. 

Core technologies 

Programming and data science: Python (backend services), Bash, PowerShell 
Data processing and orchestration: Dask (distributed data processing), Nextflow (workflow orchestration) 
Cloud and infrastructure
· Storage: S3, Aurora RDS, ElastiCache Redis, EFS, IO-intensive EBS 
· Compute: EC2, ECS, Fargate, AWS ParallelCluster 
· Orchestration and messaging: Step Functions, Lambda, Batch, SQS, SNS, API Gateway 
· Deployment and security: CodePipeline, CodeBuild, CodeArtifact, CloudFormation, CDK, KMS, Secrets Manager, IAM, ALB, ACM, VPC, CloudWatch 
DevOps: Automated CI/CD pipelines, Infrastructure as Code (IaC) principles, Infrastructure monitoring, and logging. 

Working on a Similar Project?

Contact Us Today.