Skip to content
AI SWEDEN
Stockholm mlops

A practical view on data validation for GenAI in pharma 

Table of content

What does data validation for GenAI mean in the context of pharma? 

Data validation ensures that GenAI in pharma is built on accurate, standardised and trustworthy data, making outputs safe, compliant and reliable. As GenAI adoption accelerates, pharma teams need clear, practical approaches that support everyday use without compromising quality or regulatory requirements. 

Lately the rapid growth of GenAI in pharma and biotech has created a clear need for open and practical conversations about safe adoption. This growing interest is also shaping industry events, including the AI and Life Sciences Meetup in Stockholm. During this meetup Marcel Kaminski from Holisticon Connect delivered a talk on data validation for GenAI. We have prepared this summary to help keep the discussion moving and to support teams who are looking for a simple and reliable way to work with GenAI in their daily practice. 

What challenge is the pharma industry facing as GenAI adoption accelerates?

The interest in GenAI is rising quickly, but so is the pressure on pharma companies to keep their systems safe and compliant. Investments in AI continue to grow, and regulators are now introducing stricter rules for how data and models should be managed. Many organisations see the potential of GenAI, yet they also recognise that the technology cannot deliver real value without trusted and well-prepared data. This tension between fast adoption and careful oversight is shaping the current landscape and was an important starting point in Marcel’s presentation.

What are the main barriers preventing trustworthy GenAI in pharma?

The presentation focused on four main barriers that limit the safe and effective use of GenAI in the pharma sector:
Accuracy – when data is incomplete or inconsistent, GenAI output becomes unreliable.
Consistency – GenAI models are stochastic, which means that the same prompt can produce different answers, and model drift can change results over time.
Language and domain semantics – pharma relies on specific terminology, abbreviations and units that GenAI often mixes or interprets incorrectly.
Lack of predictive capability – GenAI is strong at explaining information and finding patterns in the current data, but forecasting requires machine learning.
These challenges make it difficult to trust GenAI without a strong validation layer.

A combined approach

To address these barriers, Marcel explained the need to use GenAI together with machine learning and clear business rules. Each of these elements plays a different role. GenAI helps with understanding, summarising and organising information. Machine learning supports prediction and pattern recognition. Business rules define what is acceptable, safe and compliant in a given process. When these three parts work together, they can answer a wider range of questions, including what has happened, what is likely to happen and what action should be taken next. This combined approach creates a more stable and reliable foundation for using GenAI in pharma environments.

Introducing the data validation AI agent framework

Marcel presented a five-part framework designed to improve confidence in GenAI outputs. The first part is governance, which focuses on data standards and the creation of a single source of truth. The second part is data architecture, where data is cleaned, organised and prepared for use. The third part is the machine learning layer, which supports prediction and provides additional context for validation. The fourth part is the GenAI layer, responsible for generating and structuring information. The final part is the validation AI agent. This agent checks outputs from both ML and GenAI against trusted reference data and business rules. When results do not meet expectations, a human in the loop can review them and update the data or rules. This creates a controlled and transparent process that supports safe use of GenAI.

Strengthening the foundation

A reliable GenAI system depends on a stable data foundation. Marcel explained that this begins with clean and standardised datasets, supported by well-designed pipelines. These pipelines help prepare data in a consistent way and reduce the risk of errors. Observability also plays an important role, as it allows teams to monitor system behaviour and understand where adjustments are needed. Data lineage is another key element, giving a clear record of how data moves and changes across the system. All of these components rely on maintaining a single source of truth, which ensures that every model and validation step is based on the same trusted information.

Who is responsible for each part of the GenAI validation framework?

To make the framework work in practice, Marcel outlined a clear division of responsibilities across several teams. Data governance teams manage standards and maintain the source of truth. Data engineering teams build and operate the pipelines that prepare data for use. Machine learning teams develop models that support prediction and scoring. GenAI teams create applications that handle prompts and generation tasks. A separate validation agent team focuses on checking outputs and monitoring system performance. Business stewards receive the validated results and make decisions based on them. This structure helps organisations keep control over their data and ensures that each part of the system is supported by the right expertise.

The value of validated data

Validated data brings several important benefits for pharma teams working with GenAI. It improves accuracy and consistency, which reduces the need for repeated manual checks. It also speeds up decision-making, as teams can trust the information produced by the system. Strong validation and clear lineage lower compliance risk and make it easier to meet new regulatory expectations. Most importantly, reliable data allows GenAI to be used with confidence in areas that directly influence research, development and patient safety. In a sector where trust is essential, validated data becomes a practical foundation for everyday work.

FAQ: Data validation for GenAI in pharma
What is data validation for GenAI in pharma?

Data validation ensures that GenAI systems in pharma use accurate, consistent and trusted data. It helps make AI outputs reliable, compliant and safe for use in regulated environments.

Why is data validation essential for trustworthy GenAI?

GenAI models can produce inconsistent or incorrect results and may change behaviour over time. Data validation adds a control layer that checks outputs against trusted data and business rules, reducing risk and improving confidence.

How does validated data support everyday decision-making in pharma?

Validated data improves accuracy and consistency, reduces manual verification and lowers compliance risk. This allows teams to use GenAI more efficiently in areas that directly affect research, development and patient safety.

More to ExPlore

Passion And Execution

Who We Are

At Holisticon Connect, our core values of Passion and Execution drive us toward a Promising Future. We are a hands-on tech company that places people at the centre of everything we do. Specializing in Custom Software Development, Cloud and Operations, Bespoke Data Visualisations, Engineering & Embedded services, we build trust through our promise to deliver and a no-drama approach. We are committed to delivering reliable and effective solutions, ensuring our clients can count on us to meet their needs with integrity and excellence.