What HEOR Can Learn from Pharma Data Engineering

Epictetus taught that we should focus on what lies within our control. In Health Economics and Outcomes Research, the quality of the evidence is within our control. But only if we are willing to treat the data infrastructure as seriously as we treat the analytical methods built on top of it.

I have spent a significant amount of time building pharmaceutical data pipelines. Cleaning drug records, harmonising product names, building matching logic, auditing datasets for consistency. It is not glamorous work. But it is the kind of work that determines whether everything built on top of it can be trusted.

HEOR asks some of the most consequential questions in healthcare. Does this treatment actually work outside a trial? What does it cost society? Who benefits and who does not? The credibility of those answers rests entirely on the quality of the data feeding them. And in my experience, the data is often the weakest part of the whole exercise.

Real-world data is not born clean

HEOR increasingly depends on real-world evidence drawn from claims data, electronic health records, and pharmacy dispensing records. The promise is compelling. Instead of relying solely on controlled trial conditions, you study how treatments actually perform in actual patient populations over real time horizons.

But real-world data reflects the real world. It is collected by people under time pressure, in systems designed for billing and inventory rather than research, using terminologies that vary across facilities, regions, and time periods. Drug names are inconsistent. Diagnoses are coded differently depending on who entered them. Dosing records are incomplete. Patient identifiers are missing or duplicated.

HEOR typically handles this through manual data cleaning followed by statistical adjustments designed to compensate for the limitations of the input. This approach works, to a degree. But it is slow, expensive, difficult to reproduce, and often opaque to the stakeholders who need to trust the results. When two researchers clean the same dataset differently and arrive at different conclusions, that is not a methodological disagreement. It is a data infrastructure failure.

Cleaning data manually before every study is like mopping the floor while the tap is still running. The problem is upstream, and that is where the solution belongs.

Data engineering thinks about this differently

A data engineer does not wait until analysis time to discover that the same drug is recorded under forty different names. That problem gets designed out of the system before data reaches the analytical layer. Standardisation logic lives in the pipeline. Transformations are logged and versioned. Every record can be traced back to its source. If a cleaning decision turns out to be wrong, you can find it, correct it, and know exactly what was affected downstream.

This is not how most HEOR data preparation works. But it could be, and the field would be stronger for it.

Three things HEOR could take directly from data engineering would change the discipline significantly. First, building pipelines rather than doing one-off cleaning per study means the standardisation work compounds over time rather than being repeated from scratch for each new project. Second, treating audit trails as a standard requirement means that every transformation is documentable, every cleaning decision is reviewable, and reproducibility becomes a property of the infrastructure rather than a goal that researchers strain toward. Third, involving data engineering at study design rather than at data extraction means that the question of whether the data needed actually exists, in what form and with what reliability, gets asked before the analytical model is built around assumptions that the data cannot support.

Why this matters for Africa specifically

Multi-country HEOR studies across sub-Saharan Africa are rare partly because the data infrastructure makes them so difficult. Pipelines that harmonise data across different national registries, coding systems, and languages are exactly what would make these studies viable at scale. That is a data engineering problem, and it has a data engineering solution.

This is a call for collaboration, not replacement

HEOR requires deep methodological expertise that takes years to develop. Causal inference, health utility measurement, economic modelling. I am not suggesting that data engineers should run HEOR studies. I am suggesting that the two disciplines need each other far more than current practice reflects.

At DataFestAfrica 2024, I met practitioners who are excited about exactly this intersection of healthcare and data. The energy in that room confirmed something I already believed: the healthcare space needs more than algorithms. It needs practitioners who understand both the clinical context and the analytical tools to drive meaningful change. HEOR and data engineering, working together from the start rather than meeting at a handoff, is one of the most powerful combinations available to us.

The most important question in outcomes research is whether we can trust the answer. Luck, as Seneca observed, is what happens when preparation meets opportunity. The opportunity to produce trustworthy health economic evidence in Africa is here. The preparation required is building the data infrastructure worthy of the questions we want to ask.

That work is within our control. It is time we treated it that way.

Olayinka Akerekan

Pharmacist and data engineer working at the intersection of pharmaceutical science and analytics across sub-Saharan Africa. B.Pharm, University of Ibadan. Based in Lagos, Nigeria.

LinkedIn GitHub Email