AI & MACHINE LEARNING
BESPOKE DATA VISUALISATIONS
CUSTOM SOFTWARE DEVELOPMENT
CLOUD & OPERATIONS
DATA & ANALYTICS
EMBEDDED & ENGINEERING
IOT & CLOUD
ABOUT CLIENT:
The multinational science-led bio-pharmaceutical company focused on developing life-changing medicines. (NDA)
PROJECT GOAL:
The project goal was support Client’s Center for Genomic Research unit in genome data processing. Our Client set an ambitious goal of processing 2 million genome samples by the end of 2026, and to achieve this, a specific solution was built in the AWS cloud. This solution is a combination of infrastructure and software technologies forming a pipeline. Our team is responsible for maintaining and developing this solution to ensure its effectiveness.
Our engineering team provides top-notch maintenance services for genomics data analysis solution to ensure it’s working properly, fix bugs in logic and algorithms of data processing, and delivers new features to retrieve even more valuable data from the samples submitted to the pipeline. Our goal is to make it easier for scientists to perform further analysis and discover mutations and correlations that would only be possible to find on such a scale with this toolset.
Our services include:
– DevOps services – for maintaining Cloud infrastructure, proactively eliminate risks coming from outside (i.e. from cloud provider), building and proposing improvements, proposing and preparing architecture of sub solutions. Whole solution is Cloud-native (AWS) and serverless. For most cases we are using few core AWS technologies: Step Functions, Lambda, queues, Batch. From this set we can build almost any processing engine.
– Development – implementing bug fixes, changes and new features depending on the stakeholders needs
– Cooperation with 3rd parties: We are working as a one team in Kanban model – on a daily we have ~20 people from different parties and we all participating and supporting each other in tasks assigned by business on equal rights.
Genomics Data Analysis Solution implements multiple architectural patterns. It is designed to provide a streamlined process for analyzing genome samples. It’s an orchestrated pipeline where data (genome samples) is injected on one side and processed; structured output is coming out.
It consists of multiple steps inside, but genomic data processing logic can be divided into the following stages:
– Validation: data is validated to the pipeline for processing from various sources
– Ingestion: data is ingested to the pipeline for processing from various sources.
– Secondary Analysis: is performed using licensed software to refine the data further.
– Tertiary Analysis: more detailed variant analysis in order to generate valuable data in an easy-to-understand format.
Genomics Data Analysis solution enables the processing of vast amounts of data resulting in a wealth of valuable information that scientists can easily visualize and comprehend.
This information provides insights into different genetics correlations (e.g., between mutuation and specific diseases), leading to the development of new ideas on how to prevent or treat mutations. This research may ultimately result in the developing of novel therapies and drugs to address genetic diseases.
– Python
– React
– AWS
– Terraform
– Kubernetes
– Node.JS
– SQL
Contact Us Today.