AI & MACHINE LEARNING
BESPOKE DATA VISUALISATIONS
CUSTOM SOFTWARE DEVELOPMENT
CLOUD & OPERATIONS
DATA & ANALYTICS
EMBEDDED & ENGINEERING
IOT & CLOUD
Revolutionising the way scientists explore genomics data, our project aimed to develop a powerful, cloud-native platform for real-time analysis of genomics signals combined with population-based characteristics. This solution enabled researchers to rapidly construct cohorts, perform case-control comparisons, and generate insights in real-time, transforming the landscape of drug target identification.
The client is a multinational, science-led biopharmaceutical company focused on developing life-changing medicines. Due to confidentiality agreements, the client’s name remains undisclosed (NDA). With a global reach and a commitment to innovation, the client is dedicated to pushing the boundaries of medical research and development.
In the era of precision medicine, the ability to rapidly explore genomics data at scale has become crucial. Traditional methods of genomics data analysis are often hampered by lengthy processing times, limited real-time insights, and complex data orchestration. The client sought to overcome these barriers by building a software solution that could accelerate the exploration of genomics signals within large datasets while seamlessly integrating patient population characteristics.
Researchers required the capability to quickly define patient cohorts, perform real-time case-control comparisons, and visualise complex genomics interactions without delays. The goal was clear: transform raw genomics data into actionable insights in minutes rather than days.
To meet this challenge, our team delivered a comprehensive, cloud-native platform that provided end-to-end services, including:
Development of a scalable, cloud-native architecture tailored to genomics data analysis at scale.
Creation of REST APIs in a microservices structure using Python, enabling large-scale data ingestion, processing, and real-time cohort comparison.
Implementation of a workflow orchestration layer leveraging Nextflow CLI, AWS Step Functions, Lambda, Batch, and messaging systems (SQS/SNS) to automate complex data pipelines.
Design and delivery of a user-friendly web interface built with React, allowing scientists to define parameters, launch queries, and visualise results dynamically.
Deployment of a fully serverless and containerised solution on AWS, ensuring scalability and repeatability across environments through Infrastructure as Code (IaC) principles.
Setup of CI/CD pipelines, automated functional and unit tests, monitoring, and cloud cost optimisation strategies.
Automated data ingestion from lab instruments to AWS using Storage Gateway and event-driven processing.
Through close collaboration with stakeholders and scientists, our engineers ensured the platform met both research needs and enterprise standards.
Real-time data analysis
Approximately 0.5 million rows with 40,000 columns are searched and filtered in real-time through the backend REST API.
Significant time reduction
Waiting time for statistical comparisons between case and control groups was reduced from 2 days to just 2 minutes.
Recognition and expansion
The project became a role model for other client projects, leading to new assignments and internal awards, including the Highlight of the Year 2021.
The solution revolutionised genomic data exploration, enabling rapid hypothesis testing in the early stages of drug discovery. Researchers can now quickly construct patient cohorts, perform real-time statistical analyses, and visualise complex data interactions, accelerating R&D processes while reducing costs.
Programming and data science: Python (backend services), Bash, PowerShell
Data processing and orchestration: Dask (distributed data processing), Nextflow (workflow orchestration)
Cloud and infrastructure:
· Storage: S3, Aurora RDS, ElastiCache Redis, EFS, IO-intensive EBS
· Compute: EC2, ECS, Fargate, AWS ParallelCluster
· Orchestration and messaging: Step Functions, Lambda, Batch, SQS, SNS, API Gateway
· Deployment and security: CodePipeline, CodeBuild, CodeArtifact, CloudFormation, CDK, KMS, Secrets Manager, IAM, ALB, ACM, VPC, CloudWatch
DevOps: Automated CI/CD pipelines, Infrastructure as Code (IaC) principles, Infrastructure monitoring, and logging.
Contact Us Today.