Why African Pharma Data Needs Its Own Standards

Marcus Aurelius wrote that the impediment to action advances action. I have come to believe that this applies to data just as much as it applies to life. Every broken dataset I have encountered in African pharmaceutical markets has taught me something the textbooks never mentioned.

It taught me that the standards we inherited were not built for us.

A while ago I was handed a pharmaceutical dataset from Nigeria and asked to analyse it. Before I could do anything useful, I spent three days trying to understand what I was looking at. Not because the people who collected it did a poor job. They did their job. The problem was that the framework used to organise the data had been designed for markets that look nothing like ours. It assumed things about how drugs are named, how manufacturers are tracked, and how supply chains move that simply do not hold in Lagos or Accra or Nairobi.

That experience has stayed with me. And the more I have worked in this space, the more convinced I have become that building truly useful pharmaceutical analytics for Africa requires us to stop borrowing frameworks and start building our own.

The naming problem no one warned me about

International Non-proprietary Names exist to give every active pharmaceutical ingredient one stable, globally recognised identity. In principle, this should make African pharmaceutical data clean and interoperable. In practice, it is one of the first places things break.

I have seen the same active ingredient appear in over a hundred distinct surface forms within a single dataset. A pharmacist in Lagos writes "Amoxicillin 500mg Cap." One in Ibadan writes "AMOX 500." Another writes "Amoxil." All three are the same product. None of them match on a straightforward lookup. And if your pipeline expects clean INN fields, all three records fail silently.

This is not carelessness. It is what happens when real people record real products in real conditions without a shared, enforced vocabulary. The standard assumes the vocabulary already exists. In our markets, building that vocabulary is part of the work itself.

The problem is not that our data is dirty. The problem is that the standards measuring it were built for someone else's market.

The registry is a starting point, not a foundation

In the United States, every approved drug product has a National Drug Code. In Europe, equivalent systems exist at national and continental level. These registries anchor every data standard built on top of them. They are imperfect, but they exist, they are maintained, and they are machine-readable at scale.

Nigeria has NAFDAC. Ghana has the FDA. Both maintain product registries and both are essential resources. I have used them extensively. But coverage is not complete, update cycles vary, and getting machine-readable access at the scale needed for serious analytics requires significant preprocessing before the data becomes useful.

You cannot build a reliable African pharmaceutical analytics pipeline by simply joining to the registry. The registry is where you start. It is not where you finish.

What this means in practice

A data engineer building a pan-African pharmaceutical dataset cannot join on a product code and call it done. They need a cascade of fuzzy matching, phonetic similarity, and manual review just to answer the most basic question: are these two records the same drug? That matching logic is not a workaround. It is the infrastructure.

The market is the data

Western pharmaceutical data standards also assume a formal, linear supply chain. Manufacturer to distributor to licensed pharmacy to patient. Clean, traceable, one direction.

Parallel importation is common across West Africa. Open drug markets exist alongside licensed dispensaries. A single product can enter a country through multiple channels, carry different batch numbers at different points in its journey, and be recorded by someone with no visibility into how it got there. This is not an anomaly to be cleaned away. This is the market. Any standard that cannot accommodate it is not fit for our context, regardless of how well it performs elsewhere.

What African pharma data standards would actually look like

Epictetus taught that we must begin from where we are, not from where we wish we were. African pharmaceutical data standards would do the same. They would begin from the data as it actually exists, with all its variation and incompleteness, and build the logic to make sense of it rather than pretending the variation should not be there.

They would build tolerance for name variation into the collection layer, not leave it to be discovered at analysis time. They would recognise that a product identifier in this context must function even when barcodes are absent, registry coverage is partial, and the same product has been recorded differently across fifty facilities. And they would be built collaboratively, with regulators, pharmacists, distributors, and the data practitioners actually constructing these pipelines, rather than adapted from Geneva or Washington and handed down as universal best practice.

The pharmaceutical data infrastructure being built across Africa right now will shape how this continent manages medicines for a long time. The standards embedded in that infrastructure will determine what questions can be asked and what answers can be trusted.

That is too important a decision to make by default. We owe it to the patients at the end of every data point to build something designed for them.

Olayinka Akerekan

Pharmacist and data engineer working at the intersection of pharmaceutical science and analytics across sub-Saharan Africa. B.Pharm, University of Ibadan. Based in Lagos, Nigeria.

LinkedIn GitHub Email